Wednesday, September 25, 2013

Hadoop & compression !

Compression with Hadoop is great ! You can reduce IO, network exchange and store more data, and most of the time your Hive/Pig/MapReduce jobs will be a little faster.

Depending on what your needs are, you should think about Snappy, lzo, lz4, bzip or gzip.
 
SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

No comments: