The UW Data Science faculty are an interesting lot, highly credentialed and accomplished in a wide range of fields. Here is your chance to get to know them better. Find out what these experts have to say about the data science field, how they got started in their own disciplines, their words of wisdom for aspiring data science professionals, and more.
Why is big data getting so much attention right now?
David Reineke: Because big data is part of our lives now. Computers, sensors, Internet, and the technology of today give us big data, and governments and corporations want to use it to inform their decisions.
Erik Krohn: Almost five years ago, Google’s Eric Schmidt claimed that we are generating as much data in two days as was created from the dawn of civilization up until 2003. We have certainly been creating more data per day since that time and it will only continue to increase. This massive amount of data is usually pretty useless on its own. This is where data science comes into play. Data scientists can analyze this massive amount of data and answer important questions.
Zamira Simkins: We live in an information age. Essentially every keyboard stroke generates data. With so many technology users around the world, companies are accumulating lots and lots of data. By itself the accumulated data may not be very useful, but once it is combined with other data, processed, and analyzed, it can provide valuable insights on past performance, inform business decisions, and help make future projections.
What is your particular area of specialization and why is it important to the data science curriculum?
Abra Brisbin: My research is in statistical genetics, which means I’m frequently working with large data sets to develop statistical methods for understanding how genes relate to disease. I need to be analytical, but also flexible and creative to find out as much as I can from a data set that may contain multiple genes interacting with each other and with the environment, related and unrelated individuals from multiple populations, and messy data containing genotyping errors. I use the statistical programming language R to prepare data sets for analysis and to develop new analytical methods.
Computer programming is essential in data science because with large data sets, there’s no way you can hold all of the information in your head at once to understand or analyze the data; you need a computer. Moreover, if you’re just using a computer program written by someone else, then you’re stuck using the methods and ideas that someone else had, which may not be exactly what’s appropriate for your data or your question. Learning computer programming gives you the power to get the computer to do the analysis that you think is important.
R is a great language for learning computer programming. It’s free, which means you’ll be able to use it no matter where you go. It’s very widely used, which means there are lots of resources available to help you continue learning more specialized skills after you finish DS 710: Programming for Data Science, so you can tailor your learning to your needs. Finally, it’s a high-level programming language, which means that it has built-in functions to do many of the things you’re likely to want to do. That means you can jump right in and start programming without having to worry about things like memory management.
David Reineke: Applied statistics, which is important in the data science curriculum because a fundamental understanding of random variables and relationships among them plays a crucial role in the life of a data scientist.
Erik Krohn: I am a computer scientist and my area of specialization is algorithms. I am very interested in coming up with different ways to solve problems. The obvious improvements are coming up with a faster solution or coming up with a solution that uses less space. Most problems have an easy solution but that solution is likely slow and not feasible. There are almost always more clever ways to solve a problem using less space and less time. Since big data deals with huge amounts of data, coming up with faster algorithms is very important.
Ursula Whitcher: My background is in pure mathematics; my research specialty is the study of higher-dimensional geometric spaces important in theoretical physics. The growing field of algebraic statistics applies similar techniques to identify patterns in multidimensional real-world data. I use and develop open-source software for mathematical experimentation. I’m also interested in using data science techniques to answer questions about diversity and underrepresentation in science.
Zamira Simkins: I am an economist and work with data on a daily basis. A lot of “big data” is economic in nature—for example, consumer surveys, product prices, and sales. Economics, at least from the business perspective, is important to the data science curriculum because it explains the underlying factors behind such data. Specifically, economics explains the human behavior and decisions that generated the data. Understanding these factors is critical to selecting the right data-science techniques and making relevant inferences.
How and why did you get started in your field?
Abra Brisbin: I was a math major in college, and I really liked applying mathematics to other areas of science. I did an independent study on probability models of DNA, which I really enjoyed. So, I decided to go to grad school to study applied mathematics.
Zamira Simkins: I used to be a stock broker. To gain a competitive edge, I started developing stock forecasting models to inform and speed up my securities trading decisions.
What aspects of data science interest you most right now?
Erik Krohn: My interest relies in the computing aspect of data science. Computers are fast … but they aren’t that fast. For instance, a problem I give my students is something called the traveling salesman problem. Assume you are given 25 cities that a salesman must visit. The salesman starts in city A and must visit all 25 cities. The salesman wants to travel as little as possible so the question becomes: what is the best route such that the distance traveled is minimized? An obvious solution is to just calculate the distance of every possible route. It’s only 25 cities so it can’t take long, right? The average desktop computer would take years to check every single route. This is where my interest comes into play. We don’t just have 25 cities, we have terabytes and petabytes of data to sort through and deal with. How does one do this quickly? That’s what I care about.
This video visually compares Greedy, Local Search, and Simulated Annealing strategies for solving the Traveling Salesman problem. Credit: James Kolpack.
Zamira Simkins: As an economist, I am interested in practical applications of data science that can lead to improvements in social well-being.
Who or what has influenced you most in your career?
David Reineke: I had excellent dissertation advisors at the Air Force Institute of Technology, particularly the late Albert H. Moore, whose counsel, friendship, and wisdom saw me through those PhD years.
Zamira Simkins: My parents and the environment I grew up in. I grew up in a transition economy where nobody had much wealth or many resources. However, I was always told that with education, hard work, and dedication, I can build a successful professional career and a good life for myself and my family.
What are your favorite websites to visit? (Not necessarily data-science related.)
Do you have a favorite example of how data science has improved a certain industry or discipline? Or a favorite example of data visualization? Please explain.
David Reineke: A close friend of mine who works in business intelligence in a large corporation was just appointed as the director of a new division called the Center for Excellence in Predictive Analytics and is hiring data scientists for his team.
Erik Krohn: Algorithms that help you decide what you may “like” or “want” are fascinating to me. For instance, if I watch a movie on Netflix or buy something on Amazon, they will recommend another movie I might like or another product I may want. There are the occasional laughable suggestions, but for the most part, the suggestions given by the companies are spot-on.
Zamira Simkins: I think a good example of how data science is improving our lives is Amazon’s product recommendations. They use your individual product searches and prior purchases to showcase products of potential interest to the consumer. At times, I personally found these recommendations very useful, as they reminded me to stock up on household items I buy on a regular basis.
Which data-driven person or company do you admire most?
Ursula Whitcher: I’m a huge fan of Cathy O’Neil, a mathematician and data scientist who blogs at mathbabe.org. Dr. O’Neil writes provocatively about the way algorithms can be used to promote justice, or conversely to entrench the status quo. She also has some great advice about getting started with data science projects!
What are one or two of your predictions about the impact of data science over the next 10 or 20 years?
Erik Krohn: It will be great to see data science expand into many different areas that we can’t even imagine now. For example, one area that I think will take off is medicine. My hope is that we will see medical advances much more quickly than in the past because we are able to store and analyze so much more data.
Zamira Simkins: I think data science will play a critical role in increasing our labor productivity and economic growth in the future.
What words of wisdom do you have for students entering this program and the field?
David Reineke: Stick with it . . . you’re going to enjoy it!
Zamira Simkins: Data science professionals are and will be in huge demand. This program offers a great foundation for a professional career in the field. Upon graduation, take calculated risks to achieve the goals you desire.
To find out more about the UW Master of Science in Data Science and how you can apply to this 36-credit program, call a friendly enrollment adviser at 1-877-UW-LEARN (895-3276) or email firstname.lastname@example.org today.