Search code examples
hadoophbasereplicationapache-zookeeper

ZK hbase replication node grows exponentially though hbase datas properly replication for peers


In the hbase-1.4.10, I have enabled replication for all tables and configured the peer_id. the list_peers provide the below result:

 hbase(main):001:0> list_peers  
PEER_ID CLUSTER_KEY ENDPOINT_CLASSNAME STATE TABLE_CFS BANDWIDTH  
1 10.XX.221.XX,10.XX.234.XX,10.XX.212.XX:2171:/hbase nil ENABLED nil 0 
1 row(s) in 0.1430 seconds

But the status_replication shows replication lag

hbase(main):002:0> status 'replication'
version 1.4.10
3 live servers
    10.XX.232.XX:
       SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, TimeStampsOfLastShippedOp=Thu Jan 01 05:30:00 IST 1970, Replication Lag=**1619545264329**
       SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Tue Apr 27 23:09:23 IST 2021
    10.XX.118.XX:
       SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, TimeStampsOfLastShippedOp=Thu Jan 01 05:30:00 IST 1970, Replication Lag=**1619545264663**
       SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Tue Apr 27 18:53:23 IST 2021
    10.XX.138.XX:
       SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, TimeStampsOfLastShippedOp=Thu Jan 01 05:30:00 IST 1970, Replication Lag=**1619545263509**
       SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Tue Apr 27 10:31:05 IST 2021

But all the data are replicated properly to the defined cluster. I have checked the table in both clusters.

I have verified using VerifyReplication Mapreduce to check unreplicated rows. But there are no rows in the unreplicated one. All are good Rows.

./hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication 1 tablename

org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
        GOODROWS=45
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0 

Due to this issue, I have Zknodes under replication is growing exponentially which causes issues in running ZK cluster which eventually affects the Hbase Connection too. Below exception occurs in ZK

ERROR java.io.IOException: Len error

Increasing jute.maxbuffer in ZK will not solve the problem as replication znode is increasing though the data are replicated properly to the given cluster Peer_id.

I have enabled two-way replication between the cluster. It happens in both the cluster.

hbase version - 1.4.10  
ZK Version -  3.4.10  
Hadoop version - 2.7.3

Please help to fix this.


Solution

  • The above issue has been already filed under the below issue.

    https://issues.apache.org/jira/browse/HBASE-22784

    Upgrading to 1.4.11 fixed the zknode grows exponetially