When it comes to big data, one thing seemingly everyone can agree on is that organizations face a shortfall of data science talent. After all, the ideal data scientists aren't just wunderkinds in advanced mathematics and statistics, they're creative, non-linear thinkers with excellent communication skills. In popular parlance they're unicorns — magical creatures that don't exist.
Research firm McKinsey has predicted that by 2018, the U.S. alone may face a 50 percent to 60 percent gap between supply and demand of deep analytic talent.
Bob Rogers, chief data scientist, Big Data Solutions, Intel, might just fit the unicorn bill. Rogers, who holds a PhD in Physics from Harvard University, got his start studying supermassive black holes. He co-wrote a book on time series forecasting using artificial neural networks, which led him to co-found a quantitative futures hedge fund that leveraged large amounts of historical and streaming tick-by-tick data from markets. He's also helped a medical technology firm revolutionize glaucoma diagnostics and co-founded another business to help the healthcare industry extract data from electronic health records.
[ Related: 4 Qualities to Look for in a Data Scientist ]
For the past year, he's been the chief data scientist at Intel's Big Data Solutions, which started as a project to better leverage the data inside Intel, but has grown to encompass helping Intel clients better understand analytics and data problems. For instance, he's been working closely with the Knight Cancer Institute at Oregon Health & Science University (OSHU) to help develop the Collaborative Cancer Cloud, which aims to make it possible to sequence an individual's cancer, analyze it and formulate a precision treatment plan within 24 hours.
You'll never find that data science unicorn
In the process, Rogers has helped to define what Intel looks for in its data scientists, and it's not unicorns who have a background in math, statistics, physical science or hard science; the ability to write production-level code; and the ability to talk to business people in their own language.
[ Related: Data scientist sole shifting to focus on developers ]
"You don't have to be a unicorn," he says. "We're looking for people who have one of the major skill sets and some comfort level with the others — the ability to be creative, handle ambiguity and communicate well. One of the key outputs of that kind of thinking is the ability to characterize what's important to others."
Think in terms of data science teams with diverse capabilities that can complement each other, rather than seeking to hire individuals that can do it all, Rogers says.
"It's true that having advanced knowledge of mathematics and programming is fantastic background for a data scientist," he says. "But, in any company, you won't find just one data scientist doing it all — just like Michael Jordan couldn't have scored so many points without Scotty Pippen at his side, data scientists all bring their own skills to the table that together build an ideal team."
"In fact, we're looking for all kinds of skills and backgrounds as we look to build our team at Intel — from programmers to those with creativity, curiosity and those with great communication skills," he adds. "It's rare to find a "data unicorn" that can do it all, and we're not spending our time recruiting for such a talent. We build out teams to reflect a variety of backgrounds and experience, which brings greater insight to our data analytics work."
Getting hands dirty with data
Because there's no one-size-fits-all path to becoming a data scientist, it can be difficult to identify good candidates. Rogers' advice is to look for individuals that can show their mettle by getting their hands on a data set — perhaps from Kaggle, DataKind or the government — building up a data analytics environment and telling a story with that data. And individuals interested in pursuing data science should take it upon themselves to seek out data sets to work with.
"Get your hands on data and do something with it," he says. "There are big data sets out there, some of them sufficiently ugly enough to give you some real experience working with data. Take a big data set, put it into an environment where you can really do something with it and answer a simple question. You don't need to do anything too technically crazy. When you work with data that's messy, you start to see where data sets go wrong. That is the moment you start to speak the data scientists' language."