Data replication

HDFS architecture supports placing very large files across the machines in a cluster. Each file is stored as a series of blocks. In order to ensure fault tolerance, each block is replicated three times to three different machines. It is known as a replication factor, which can be changed at the cluster level or at the individual file level. It is a NameNode that makes all the decisions related to block replication. NameNode gets heartbeat and block reports from each DataNode. Heartbeat makes sure that the DataNode is alive. A block report contains a list of all blocks on a DataNode.