By James Kobielus
Big Data is a bit like our solar system. It’s a brilliant system of information and analysis that emerges from the inchoate gas, dust, rocks and crystals known as “data.” Cloud computing is the galaxy wherein the stars, rocks, and particles exist and interact.
To play this analogy out, data scientists would be the astronomers. They’re the ones who explore the spinning, interconnected, system, much of which consists of scattered matter that we lump together under the term “unstructured.”
But what exactly is a data scientist? Simply put, the data scientist is among the most important developer in Big Data. The discipline includes statistical analysts, data miners, predictive modelers, computational linguists, and other professionals whose job is to find deep insights in large, complex data sets. You can’t unlock the full value of Big Data in your business if you don’t bring together your best and brightest data scientists and give them the tools they need to do their job with maximum productivity.
While you don’t need a Ph.D. in statistics to be a data scientist, you do need curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor and a skeptical nature. You must also be articulate, because no one will accept the validity of the patterns you surface if you can’t explain clearly how you built your model, what variables and data you used, or what the results truly mean in the context either of some business problem or scientific endeavor.
Though some in the industry are concerned about a shortage of business-oriented data scientists, research shows the opposite. Smart people are flocking to the data-scientist profession in greater numbers to advance their careers and to be leaders in solving complex global issues. And academia is responding with new coursework and programs designed to train and educate the next generation of technorati.
For example, IBM’s data scientists are using Big Data analytics to help manage precious water resources in South Africa, Florida, Indiana, and many more places. Eoin Lane is a Smarter Water architect who’s working on finding other ways that data and technology can solve water problems around the world.
Another IBM scientist, Michael Haydock, Chief Scientist, IBM Global Business Services, is helping retail brands forecast sales trends up to 4 times more accurately using 22 years of historical data. IBM researchers at the IBM Customer Experience Lab have launched a prototype technology called the Virtual Closet, which uses data to customize recommendations for customers based on items they’ve recently purchased or shown interest in.