By Mitesh Vasa
All-rounder James Faulkner was scoring well before his double wicket maiden that clinched Australia’s 2015 Cricket World Cup finals win over New Zealand last month.
He was scoring with data, or maybe more appropriately, with #ScoreWithData, IBM’s social media insight analysis into players, teams, matches, brands, cities, and fans.
By the end of the six-week-long event played across Australia and New Zealand, Faulkner’s 30 percent “buzz” of 1 million tweets made him the online MVP, well before he earned player of the match versus Cup co-host New Zealand.
Sports provide opinionated natural language data, ripe for machine learning opportunities. That’s one of the reasons our team at IBM Research’s lab in India customized IBM BigInsights’ “social data accelerator” plug-in to scan Twitter for all things Cricket World Cup.
All told we scanned between 700 and 800 keywords per match, ranging from obvious ones like names of players, referees, and stadiums, to cricket-specific technology like “spidercam,” and “UDRS.”
We also scanned several hashtags like the broadly used #cwc15, and match-specific ones like #INDvPAK, among many others, and Twitter handles of popular cricket players, sports journalists covering the Cup, retired players and cricket organizations.
And at every hour, every day of the Cup from February 14 to March 29, we ingested relevant tweets and analyzed about 100,000 tweets per match on an average, reaching a peak of 1 million during the semi-final and final matches. We could then analyze sentiment about teams, rivalries, players, and play on the field, as well as fine-grained temporal analytics around short-lived in-game action like boundaries, sixes and wickets to generate insights about which particular event generated more social media attention.
All of this advanced data curation and integration capabilities, coupled with text mining and natural language processing, was continuously delivering insights through the @scorewithdata Twitter handle and to CNN-IBN, a leading English news TV channel tracking the Cup. All to give fans a new dimension and connection to the 14 countries and 49 matches.
In addition to temporal analytics and social sentiment analytics, we ranked celebrity tweets during the course of the match using an Influencer Index – a metric that helped predict which tweet will generate the maximum retweets. We routinely generated interesting insights too, such as the buzz around the retirement “farewell” for Pakistan’s Misbah-ul-Haq and Shahid Afridi. They generated more than four times as much online chatter as the farewells for Sri Lanka’s Kumar Sangakkara and Mahela Jayawardene. And controversial umpires gained their fair share of unwanted attention. On-field umpire Aleem Dar became the most-talked-about referee during the quarter-final week (with 61 percent of Cup social media chatter) due to controversial “excessive height,” and “no ball” decisions in the India-Bangladesh match.
This analysis gave media agency Ogilvy and CNN-IBN splashy graphics to show during matches, like when India’s batsman Virat Kohli had the most buzz in their match against Australia at 46 percent – but 14 percent of it was negative (also the highest among players in this match). It was a clear reflection of his sub-par performance in the eyes of a billion of his countrymen.
We also shared more specific statistics about in-game actions, like when New Zealand’s Grant Elliot hit the game-winning “towering six” off of South Africa’s Dale Steyn. This event generated more tweets than any other boundary, six or wicket in that match, as it signaled an end to a nail-biting thriller match.
Sports: The Perfect Data Generator for Sentiment Analysis
Real time analysis gives fans a closer connection to their favorite teams and players. In this new multimedia world of the second screen, they can rub digital elbows with celebrities, like TV personality Harsha Bhogle during India’s match with Australia.
And they can see how their reactions stack up with others – like the fact that, despite the India-Bangladesh match being the most popular match amongst the four quarter-finals, the bowler and an all-rounder with the most buzz was neither an Indian nor a Bangladeshi, but rather two Pakistanis (Wahab Riaz and Shahid Afridi respectively).
Sports, though, aren’t the only areas that IBM Research has delivered insights based on social media. For example, our team analyzed chatter about the 2013 elections in the Philippines. We’re continuously developing and refining the text mining, Natural Language Processing, and data integration algorithms based on what we learn from these engagements – and this is still the beginning.
Improving the ability to understand human language will help machines provide better information for everything from a doctor’s medical diagnosis, to a meteorologist’s weather forecast.