Our project – Real-time Emotion Analysis on Twitter – has completed. We are thinking about the extensibility of our project and we get some ideas.
We use spout and bolt in our project to process data. Spout is in master node and it is like the data file input in Hadoop. Bolt is in worker node and it is like the worker in Hadoop. In our project, the spout is to read Twitter Streaming data and send it to several bolts, which is the process of mapper. And bolt can send data to another bolt. So it is like a chain. Whenever we want to add some new features into our system, you just need to write a new bolt and add it into the processing chain.
In our project, there are two kinds of bolts. The first kind of bolt is to analyze the tweet and extract the useful information and add the emotion value. The the data will be sent to the next bolt, which is responsible to collect and gather the data from several source bolts and publish them into a Redis channel.
So here comes our scalability and extensibility. For the tweet analysis, if we also want to analyze the hashtags as well, all we need to do is to add a kind of bolt in the chain. Then you can either send the result to the reducer or just send the result into another redis channel.