Search code examples
hadoophadoop-yarnadministration

Recommandations for cluster's nodes resources on Hadoop ?


Is it recommended to use the same ressources (CPU and RAM) on all machines of a cluster ?


Solution

  • Infrastructure configuration of your cluster would be determined by the business case for which you are building the cluster which would in turn translate to data processing requirement that cluster needs to meet to achieve the business outcome. In General, hadoop system were initially designed with the notion there would be machines with heterogeneous configuration in a cluster. (Now Server vendors have machines optimized for hadoop workload , with some disk sizing variability between Masters and Slaves ).

    To address your questions specifically , i have seen at some sites cluster with up to 50 nodes with exact same configuration for masters and slaves (which i thought was a bit of an over kill). Quiet often architectural design decisions do not always determine procurement decisions.

    The following links from 3 major Hadoop Distribution providers would be a good starting point to understand more on cluster design and apply site specific parameters (i.e. Data processing needs,data growth,data retention,replication..etc ):

    Hortonworks:

    https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.5/bk_cluster-planning/bk_cluster-planning.pdf

    Cloudera:

    https://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

    MAPR:

    http://doc.mapr.com/display/MapR/Planning+Cluster+Hardware