Today researchers in life sciences are required to work with and analyze giga and terabyte size data sets. Similarly, students on university campuses walk around with hard drives in their backpacks with terabytes of research data. Much of this data moves at variable speeds, and is in different formats fueled by a new generation of high throughput data production technologies such as DNA sequencers and super resolution microscopes.
In many ways, big data has been around us for a long time. Phone call records, credit card transactions and financial trading logs have been creating information and challenges for decades. However, now it is the proliferation of sources: social networks, logs of Internet activities, eCommerce, video, image and SMS records that are requiring organizations expand their capabilities to integrate and manage these large data sets. The fact that 80 percent of this information is unstructured adds another dimension to the challenge which we also face in life sciences research.
An ability to work with and analyze this big data are essential skills that every life scientist must possess. But how do you prepare students for emerging roles like that of the data scientist? You give them the audacity — and training — to formulate research hypothesis that push big data technologies beyond current capabilities.
Data scientists go beyond simply capturing, cleaning, securing, and analyzing big data; they need to interpret it for a greater good.
As educators, we must provide our students with a strong interdisciplinary skill set including hands on experience with technologies and tools that will be key to leveraging big data. This means today’s graduates need a mix of technical and business skills that are highly sought after in academia and industry.
Our efforts at iPlant collaborative are directed towards building cyber-infrastructure that enables plant biologists to efficiently analyze large data sets and we are using familiar open source technologies that power many of our big data initiatives. Our goal is to provide students access and training to all levels of our infrastructure that are designed to leverage big data
I recently invited Anjul Bhambhri, IBM’s vice president of big data, to speak at our ongoing technology lecture series at iPlant about big data skills. She discussed during the virtual lecture the many ways that businesses are analyzing big data, such as for fraud protection, neonatal analysis, traffic control, customer retention, and clean energy.
Anjul also answered questions from current students and professionals on what opportunities are available for those interested in pursuing big data careers. The view from the classroom is optimistic as nearly every industry can benefit from employees that assume the role of data scientist.
Working closely with our industry partners like IBM, we can ensure that our graduates are well versed in big data skills before they enter the workforce.
To learn more about our own big data initiatives at iPlant collaborative visit http://www.iplantcollaborative.org/discover/data-store. To learn more about our project go to http://www.iplantcollaborative.org/.
Read IBM Vice President of Big Data Anjul Bhambhri on What Is A Data Scientist?