Search code examples
hadoophdfshadoop2high-availability

What are the pros and cons of Hadoop HA QJM and NFS?


Does there some rules when we need to use QJM or NFS for Hadoop High Availability?


Solution

  • QJM is obviously better than NFS.

    From Apache documentation page:

    In order for the Standby node to keep its state synchronized with the Active node, the current implementation requires that the two nodes both have access to a directory on a shared storage device (eg an NFS mount from a NAS). This restriction will likely be relaxed in future.

    If NFS mount is down or had some issues, then High availability can't be achieved.

    In QJM, the edits are written to multiple Journal Nodes and probability of failure is less compared to NFS option.

    Related SE question:

    Secondary NameNode usage and High availability in Hadoop 2.x