Search code examples
hadoopreplication

Replication factor


I am new to Hadoop and I want to understand how do we determine the highest replication factor we can have for any given cluster. I know that the default setting is 3 replicas, but if I have a cluster with 5 node what is the highest replication factor that I can user in that case. Is there a formula that we have to follow to determine the replication factor?

Thank you


Solution

  • The highest replication factor that you can use is a function of the number of nodes in your cluster (as @Tarik said, you cannot have more replicas than nodes in your cluster), your expected usage (how much data do you plan to store) AND your cluster's storage capacity.

    This other SO question has some calculations on capacity and storage use.