Students for a Smarter Planet ..leaders with conscience
November, 7th 2013
15:08
 

Posted by
hongfei in

Post feed

RSS 2.0

We encountered the issue about large amount of intermediate results are generated by Map Task. It degrades our performance a lot. By searching solution online, we decided to setup Hadoop cluster using Lzo module to compress both map and reduce results. Couple of good resources & tutorial for setting up Lzo on Hadoop list below:

  • http://www.oberhumer.com/opensource/lzo/
  • https://github.com/twitter/hadoop-lzo
  • https://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/wiki/FAQ?redir=1

Details of setting up Lzo package:

  1. Download lzo package: git clone git://github.com/toddlipcon/hadoop-lzo.git
  2. Install required tools: lzo-devel, ant, java and gcc
  3. Build it by ant: ant clean compile-native tar
  4. Copy built library to path ~/hadoop-1.2.1/lib/native/Linux-amd64-64/ on all nodes (master and slaves)
  5. Add configure in both core-site.xml and mapped-site.xml

 

 

 

Bookmark and Share

Previous post

Next post

 
ChatClick here to chat!+