Showing posts with label Storm. Show all posts
Showing posts with label Storm. Show all posts

Friday, November 6, 2015

Hadoop Mini Clusters !

A nice project here to do some local test/development ! You can find others interesting projects in the Hortonworks gallery.

Sunday, September 21, 2014

Summingbird !

Last week, I went to a meetup about streaming platform and there was a great guy who presents Summingbird : library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.

Wednesday, July 23, 2014

My Hadoop is not working, what can I do ?

Keep calm and ;-)
  • First check your logs
  • Is the service is running ? (netstat -nat | grep ...)
  • Is it possible to access it ? (telnet ip port)
  • Is there a problem linked with path, java libraries, environment variable or exec ?
  • Am I using the correct user ? 
  • What is the security system in place ?
  • Are nodes well synchronized ?
  • What about memory issue ? (swap should be desactivated also)

Friday, October 18, 2013

Hadoop 2.0 !

Apache Hadoop 2.0 has just been released some days ago ! Hadoop is no longer only a MapReduce container but a multi data-framework container and provides High Availability, HDFS Federation, NFS and snapshot !

Wednesday, April 10, 2013

Storm & Hadoop/Hive partitioning load !

I am currently working on how Storm can load data into a partitioned Hadoop/Hive table.

This is how I do :
  • put hadoop libs into the Storm lib directory
  • add the hadoop xml conf and parse them using conf.addRessource();
  • create a HDFSBolt (implements IRichBolt)
  • add some private HashMap<String partition, FSDataOutputStream fsDataOutputStream >
  • override execute function (if the partition already exists use current buffer else create a new one)
You can also choose to do partitioning using Storm grouping and so limit the number of partition per worker !

Friday, March 22, 2013

Storm & real-time ETL !

Storm is a amazing scalable, fault-tolerant, open-source, real-time ETL. Let's storm !

Saturday, March 16, 2013

Main Storm daemons !

  • Nimbus (The Storm JobTracker)
  • Supervisor (The supervisor daemon is responsible for starting and stopping worker processes)
  • UI (administration website)

Saturday, October 20, 2012

Real time Hadoop !

Want to use Hadoop for real time processing ? Then use Flume for collecting, Storm for calculation and HBase for handling client IO !