Over the years, Hadoop has become somewhat of an essential nowadays. A number of projects have grown around this stack of code now. In fact, Hadoop has grown more than just being a stack of code. The Hadoop ecosystem is growing everyday with more and more tools being related to it. Here are a few of the most popular tools that are coming today. Ambari: This tool brings the user a web-based GUI along with wizard scripts. These can be used by the user for creating clusters with the standard components. The code for this can be found here. Hadoop Distributed File System (HDFS): In this, large files are broken into small blocks and a number of nodes are created. These nodes may hold all the smaller blocks from a particular file. You can find this one here. HBase: If the data in question is falling into a big table, the HBase will be used for searching through it, storing it and also sharing it automatically across nodes. Find this system here. ZooKeeper: You would already know that Hadoop can run on multiple systems. When this is done, it becomes tougher to keep a track of and understand the various clusters. This is where Zookeeper comes in. This tool maintains a hierarchy on the cluster and stores the meta data for machines. You can get more on Zookeeper here. NoSQL: HBase is good and so is HDFS. Sometime though, you will need to use NoSQL data stores for storing the data. Mahout: Mahout aims to bring algorithms for data analysis, filtering and classification to Hadoop. It already allows for K-Means, parallel, Dirichelet and Bayesian. You can get this under the Apache license here. Lucene/Solr: This is a tool for distributed text management, which has been written in Java. It integrates into Hadoop easily. It integrates with Hadoop very easily and can be found here. |
The main intention of this site to provide Engineering information and preparing Engineering students for leadership in their fields in a caring and challenging learning environment.