Search code examples
hadoophdfsclouderacloudera-manager

Change IP address of a Hadoop HDFS data node server and avoid Block pool errors


I'm using the cloudera distribution of Hadoop and recently had to change the IP addresses of a few nodes in the cluster. After the change, on one of the nodes (Old IP:10.88.76.223, New IP: 10.88.69.31) the following error comes up when I try to start the data node service.

Initialization failed for block pool Block pool BP-77624948-10.88.65.174-13492342342 (storage id DS-820323624-10.88.76.223-50010-142302323234) service to hadoop-name-node-01/10.88.65.174:6666
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(10.88.69.31, storageID=DS-820323624-10.88.76.223-50010-142302323234, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster25;nsid=1486084428;c=0)
    at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:656)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3593)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:899)
    at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:91), I was unable to start the datanode service due to the following error:

Has anyone had success with changing the IP address of a hadoop data node and join it back to the cluster without data loss?


Solution

  • Turns out its better to:

    1. Decommission the server from the cluster to ensure that all blocks are replicated to other nodes in the cluster.
    2. Remove the server from the cluster
    3. Connect to the server and change the IP address then restart the cloudera agent
    4. Notice that cloudera manager now shows two entries for this server. Delete the entry with the old IP and longest heartbeat time
    5. Add the server to the required cluster and add required roles back to the server (e.g. HDFS datanode, HBASE RS, Yarn)
    6. HDFS will read all data disks and recognize the block pool and cluster IDs, then register the datanode.

    All data will be available and the process will be transparent to any client.

    NOTE: If you run into name resolution errors from HDFS clients, the application has likely cached the old IP and will most likely need be restarted. Particularly Java clients that previously referenced this server e.g. HBASE clients must be restarted due to the JVM caching IPs indefinitely. Java based clients will likely throw errors relating to connectivity to the server with changed IP because they have the old IP cached until they are restarted.