Monday, February 10, 2014

HDP [>] 2.1 natively available applications !

Stack components :
  • MapReduce (API v1 & v2) : software framework for processing vast amounts of data
  • Tez : more powerful framework for executing DAG (directed acyclic graph) of tasks
  • HOYA, HBase on YARN : distributed, column oriented database
  • Accumulo : (Linux only) sorted, distributed key / value store
  • Hue : web application interface for Hadoop ecosystem (Hive, Pig, HDFS, ...)
  • HDFS : hadoop distributed file system
  • WebHDFS : interact to HDFS using HTTP (no need for library)
  • WebHCat : interact to HCatalog using HTTP (no need for library)
  • YARN : Yet Another Resource Negotiator, allows more applications to run on Hadoop
  • Oozie : workflow / coordination system
  • Mahout : Machine-Learning libraries which use MapReduce for computing
  • Zookeeper : centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services
  • Flume : data ingestion and streaming tool
  • Sqoop : extract and push down data to databases
  • Pig : scripting platform for analyzing large data sets
  • Hive : tool to query the data using a SQL-like language
  • SolR : plateform for indexing and search
  • HCatalog : meta-data management service
  • Ambari : set up, monitor and configure your Hadoop cluster
  • Phoenix : sql layer over HBase
Components being developed / integrated :
  • Spark : in memory engine for large-scale data processing
  • Falcon : data management framework
  • Knox : single point of secure access for Apache Hadoop clusters (use WebHDFS)
  • Storm : distributed realtime computation system
  • Kafka : publish-subscribe messaging system
  • Giraph : iterative graph processing system
  • OpenMPI : high performance message passing library
  • S4 : stream computing platform
  • Samza : distributed stream processing framework
  • R : software programming language for statistical computing and graphics
What else ;-) ?

No comments: