- MapReduce (API v1 & v2) : software framework for processing vast amounts of data
- Tez : more powerful framework for executing DAG (directed acyclic graph) of tasks
- HOYA, HBase on YARN : distributed, column oriented database
- Accumulo : (Linux only) sorted, distributed key / value store
- Hue : web application interface for Hadoop ecosystem (Hive, Pig, HDFS, ...)
- HDFS : hadoop distributed file system
- WebHDFS : interact to HDFS using HTTP (no need for library)
- WebHCat : interact to HCatalog using HTTP (no need for library)
- YARN : Yet Another Resource Negotiator, allows more applications to run on Hadoop
- Oozie : workflow / coordination system
- Mahout : Machine-Learning libraries which use MapReduce for computing
- Zookeeper : centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services
- Flume : data ingestion and streaming tool
- Sqoop : extract and push down data to databases
- Pig : scripting platform for analyzing large data sets
- Hive : tool to query the data using a SQL-like language
- SolR : plateform for indexing and search
- HCatalog : meta-data management service
- Ambari : set up, monitor and configure your Hadoop cluster
- Phoenix : sql layer over HBase
Components being developed / integrated :
- Spark : in memory engine for large-scale data processing
- Falcon : data management framework
- Knox : single point of secure access for Apache Hadoop clusters (use WebHDFS)
- Storm : distributed realtime computation system
- Kafka : publish-subscribe messaging system
- Giraph : iterative graph processing system
- OpenMPI : high performance message passing library
- S4 : stream computing platform
- Samza : distributed stream processing framework
- R : software programming language for statistical computing and graphics
No comments:
Post a Comment