Instrumented Interconnecteds Intelligent

Bob Picciano, Sr.VP, IBM Analytics

Bob Picciano, Sr. VP, IBM Analytics

By Bob Picciano

Over the weekend, a room full of top developers competed in a hackathon in San Francisco–vying for bragging rights to coding on top of the Spark data-processing engine. The winners will be announced later, but, based on the results of an internal IBM hackathon a few weeks ago, I can give you the bottom line: these competitions show that Spark could shake up data analytics just like the Linux operating system blew the lid off the Internet a decade ago.

Today, large-scale data processing is available mainly to corporations, government agencies and universities. Spark, an open source software project under the Apache Software Foundation umbrella, has the potential to place these capabilities at the fingertips of all types of people and organizations all over the world. The goal: deeper and faster insights.

IBM is announcing today that we’re backing Spark by committing more than 3,500 researchers and developers to work on Spark-related innovations and to collaborate with the Spark open-source community to enhance the technology and push it in new directions. We’re going to embed Spark into our analytics and commerce platforms. And we’re contributing our SystemML machine learning technology to the Spark community.

When IBM put its muscle behind Linux in 1999, that move marked the beginning of its ascendancy in corporations and Internet-class data centers. The same sort of thing could happen now with Spark.

YouTube Preview Image

Already, Spark is helping enterprises transform the way they do business. For instance, Independence Blue Cross, a health insurer serving 7 million people nationwide, uses Spark to accelerate collaboration between its own researchers and academic partners with the goal of getting new claims and benefits apps built and available to customers much faster.

Matei Zaharia, asst. professor, MIT

Matei Zaharia, assistant professor, MIT

I’m guessing that many people reading this post have never heard of Spark, so let me tell you a little about it. The technology was invented in 2009 by researchers at the University of California at Berkeley led by a Romanian computer genius, Matei Zaharia. They were searching for ways to speed up the processing of unstructured data–information that’s not organized in the columns and rows of a traditional database.

In the past, it was very difficult to analyze large quantities of such data. Then along came a technology called Hadoop, which made it easier to process the data using clusters of computers. Spark is a younger cousin to Hadoop. The technology is particularly good at analyzing data when it’s stored in computer memory rather than on disks–improving performance by 100X in some cases. It’s especially useful for handling machine learning algorithms.

Spark doesn’t just make it possible to crunch huge amounts of data really fast; it also enables developers to innovate rapidly. That quality was amply demonstrated at our internal Spark hackathon a few weeks ago. Thousands of IBM programmers came to an internal Web site to learn about Spark. We gave them three weeks to form teams and develop “moon shot” projects.  And they responded energetically, producing 100 really impressive applications–software that could really matter in the world.

We didn’t give our programmers any training in using Spark before they plunged into the hackathon, and that points to another of the technology’s winning attributes: ease of use. It’s easy to learn, easy to program with and easy to import algorithms to.

Most people call Spark a data analytics engine or a programming framework, but I see things a little differently. To me it’s really an analytics operating system. Like Linux, it’s a foundation upon which developers of all types, from startups to giant corporations, can build applications. We’re making it even easier for developers to built applications using Spark by hosting it on our Bluemix cloud-development platform. We’re also committed to helping train at least 1 million data scientists and data engineers on the Spark technology.

Spark is already one of the most dynamic open source communities, and I believe it could become the most important open source project globally over the next decade. This technology has great potential to accelerate the pace of innovation in data analytics. IBM wants to help our clients and partners make the most of it.

Bookmark and Share

Previous post

Next post

May 29, 2016
5:28 am

A great article! Very informative and to the point

Posted by: Munyole
May 25, 2016
3:23 am

This sounds so great!!

Posted by: Alice Ngigi
May 25, 2016
3:22 am

This looks amazing.

Posted by: Alice Ngigi
May 25, 2016
3:20 am

Good information.

Posted by: Alice Ngigi
May 25, 2016
3:19 am

Thanks for sharing.

Posted by: Alice Ngigi
May 25, 2016
3:18 am

Great information. Thanks for the post.

Posted by: Alice Ngigi
May 25, 2016
3:18 am

Very good information.

Posted by: Alice Ngigi
May 25, 2016
3:17 am

This is interesting.

Posted by: Alice Ngigi
May 17, 2016
3:31 pm

Revolutionary! WOW! Keep up the great job.

Posted by: Psych Nairo
May 5, 2016
1:31 am

love the article very informative.

Posted by: Kioi
April 11, 2016
12:58 am

videonya bagus banget

Posted by: obat hernia
April 8, 2016
6:49 pm

Excelent post friend felixitasion

Posted by: dragon ci
March 5, 2016
9:28 am


Posted by: imo for pc
March 3, 2016
12:39 pm

snapchat pc

Posted by: snapchatpc
March 3, 2016
12:34 pm

mobogenie for pc

Posted by: mobogeniepc
March 2, 2016
1:24 am

I blog frequently and I truly appreciate your information. The
article has really peaked my interest. I will take a note
of your website and keep checking for new information about
once per week. I subscribed to your Feed as well.

Posted by: geburtstagswunsche fur schwester
February 12, 2016
5:00 am

Thanks for another great article. The place else could anybody
get that kind of information in such a perfect
approach of writing? I have a presentation next
week, and I’m at the search for such information.

Posted by: geburtstagswunsche fur schwester
January 22, 2016
3:16 pm

I read this piece of writing completely concerning the resemblance of latest and
preceding technologies, it’s remarkable article.

Posted by: profitable business online
January 6, 2016
2:47 am

I have seen and experienced the role played by open source software. They are great for businesses.

Posted by: Alex
November 7, 2015
5:54 am

interesting info

Posted by: تست جوش
November 5, 2015
2:57 am


Posted by: خرید ملک در ترکیه
November 4, 2015
4:40 am

Open Source is hard

Posted by: درب ضد سرقت
July 30, 2015
12:26 pm

Thanks for this insightful introduction to Spark. My world-wide team will put this to work in our partnership with the IBM Spark team and the Executive Briefing Program.

Posted by: Bruce Williams
July 16, 2015
5:05 pm

Very exciting!! I can’t wait to get my hands on Spark and use it for development. Has anybody looked at it in Bluemix? I don’t see it there.

Posted by: Deepika
June 30, 2015
6:23 am

Thank you for the information….Good to know about spark and its importance.

Posted by: Vijetha Reddy
June 30, 2015
2:38 am

Thanks for this article. Very informative.

Posted by: Veena
June 25, 2015
7:16 am

Glad to hear about Spark, an OS to analyze data.. interesting.. Thank you…
Now my thought: As we’ve different (human) languages for different culturs/countires, we need different OS like foundations for different business/organization or data processing needs;

Posted by: Tomy
June 19, 2015
2:47 am

This is so encouraging,coz its also enable the developers to innovate rapidly.great.

Posted by: mercy nyambura
June 18, 2015
6:25 am

Great job Spark for helping enterprises transform the way they do business.

Posted by: Dennis
June 17, 2015
6:47 am

Its new concept in marketing revolution to store all data in one software by IBM

Posted by: Father's day
June 17, 2015
3:19 am

Learning this for the first time. Great job Matei Zaharia. A step centuries ahead in the inventions worldwide.

Posted by: festus
June 16, 2015
12:14 pm

Is there a community of sociologists, anthropologists, psychologists studying or discussing the ramifications of big data conclusions, use and dissemination?

Posted by: om
June 15, 2015
5:40 pm

Apache Spark Community Event Livestream. Tune in and hear how IBM and Spark are driving insights and accelerating Spark innovation. Watch the Livestream at 7:00 p.m. – 8:45 p.m. (PDT) on 15 June 2015. Replay will be available shortly after.

Posted by: Maria Diecidue
June 15, 2015
3:39 pm

Very cool! Great learning about Spark and being at the front of this innovation

Posted by: Diane Mench
June 15, 2015
2:51 pm

Thanks for the article. Goo Spark. Watching this space to see how it turns out. Front row seats to the next big thing.

Posted by: Psych Nairobi
June 15, 2015
2:29 am

This is a wonderful step to millions around the world who rely on opean source data for research & training. Thank you for posting this article and Introducing spark to the world.

Posted by: Cavs
June 15, 2015
2:24 am

This is interesting, great information there.
Thank you

Posted by: ps
1 Trackback
November 24, 2015
8:00 am

[…] open sourced SystemML in June when we threw our weight behind the Apache Spark project—which enables developers and data scientists to more easily integrate Big Data analytics into […]

Posted by: THINK Introducing a Universal Translator for Big Data and Machine Learning
Post a Comment