Wednesday, April 10, 2013

Storm & Hadoop/Hive partitioning load !

I am currently working on how Storm can load data into a partitioned Hadoop/Hive table.

This is how I do :
  • put hadoop libs into the Storm lib directory
  • add the hadoop xml conf and parse them using conf.addRessource();
  • create a HDFSBolt (implements IRichBolt)
  • add some private HashMap<String partition, FSDataOutputStream fsDataOutputStream >
  • override execute function (if the partition already exists use current buffer else create a new one)
You can also choose to do partitioning using Storm grouping and so limit the number of partition per worker !