Tuesday, 13 September 2016

Hadoop 2.0 Cluster Architecture

Hadoop 2.0 Cluster Architecture Federation


1)  In Hadoop 1.0., the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode and if there was some kind of failure at this point, the cluster would be unavailable until the NameNode was either restarted.



2) In HADOOP 2.0, the Resource Manager takes over the Job Tracker and the Node Manager takes over the Task Tracker. 
3) The HDFS High Availability feature in Hadoop 2.0 addresses the above problem by providing a work around of running two redundant NameNodes in the same cluster. This allows a fast failover to a new NameNode in the case there is a failure. 
4) There are two NameNodes in HA: Active and Standby NameNode. The Data nodes send the block report to both these NameNodes. Any changes made is then updated in the shared edit logs and the standby NameNode periodically reads the edit logs but the writing process in the edit log is done only by the active NameNode. This forms the concept of fencing. In case of any failure at active NameNode, the standby NameNode takes over and becomes the primary NameNode.

No comments:

Post a Comment