Search code examples
clickhouse

How to add a new CH Keeper node to the existing cluster


I am doing experiments and have followed the steps from this (is not for CH Keeper's but I guessed it would work): https://www.ibm.com/docs/en/b2b-integrator/6.1.0?topic=setup-joining-new-zookeeper-nodes-existing-nodes
But it didn't work and here is the detail.

Based on the direction:

  1. have 3 nodes (CH3, CH4, CH5) cluster up and running: zk_followers 2
  2. add/start a new node (CH6) with the new 4 keeper cluster config: <zookeeper> and <keeper_server>
  3. stop/update config/start the followers (CH4 and CH5) one by one with the new 4 keeper cluster config
  4. stop/update config/start the leader (CH3) with the new 4 keeper cluster config

However, the cluster doesn't allow the new node and CH6 says:

2023.04.19 16:30:26.628691 [ 24960 ] {} <Error> RaftInstance: [PRE-VOTE DONE] this node has been removed, stepping down

A work-around is to delete /var/lib/clickhouse/coordination from all 3 nodes and deploy(start) the new 4 node cluster.
What did I do wrong? What is the proper way to do this?
CH version: 23.1.3

Detail log messages: in config.xml, I use IP addresses so actual log messages contains IP addresses. But for convenience, I have replaced IPs with hostnames.

----- from the new node (CH6) -----
2023.04.19 16:30:26.624287 [ 24958 ] {} <Information> RaftInstance: [ELECTION TIMEOUT] current role: follower, log last term 0, state term 0, target p 1, my p 1, hb dead, pre-vote NOT done
2023.04.19 16:30:26.624310 [ 24958 ] {} <Information> RaftInstance: [PRE-VOTE INIT] my id 6, my role candidate, term 0, log idx 0, log term 0, priority (target 1 / mine 1)
2023.04.19 16:30:26.627832 [ 24960 ] {} <Information> RaftInstance: 0x7efe19c5e998 connected to CH5:9444 (as a client)
2023.04.19 16:30:26.627863 [ 24963 ] {} <Information> RaftInstance: 0x7efe19c5e618 connected to CH4:9444 (as a client)
2023.04.19 16:30:26.627917 [ 24962 ] {} <Information> RaftInstance: 0x7efe19c5d118 connected to CH3:9444 (as a client)
2023.04.19 16:30:26.628360 [ 24963 ] {} <Information> RaftInstance: [PRE-VOTE RESP] peer 5 (X), term 0, resp term 0, my role candidate, dead 1, live 0, num voting members 4, quorum 3
2023.04.19 16:30:26.628415 [ 24963 ] {} <Information> RaftInstance: [PRE-VOTE RESP] peer 4 (X), term 0, resp term 0, my role candidate, dead 1, live 0, num voting members 4, quorum 3
2023.04.19 16:30:26.628631 [ 24960 ] {} <Information> RaftInstance: [PRE-VOTE RESP] peer 3 (X), term 0, resp term 0, my role candidate, dead 1, live 0, num voting members 4, quorum 3
2023.04.19 16:30:26.628691 [ 24960 ] {} <Error> RaftInstance: [PRE-VOTE DONE] this node has been removed, stepping down
2023.04.19 16:30:28.409604 [ 24960 ] {} <Information> RaftInstance: stepping down (cycles left: 1), skip this election timeout event
2023.04.19 16:30:30.039775 [ 24966 ] {} <Information> KeeperDispatcher: Server still not initialized, will not apply configuration until initialization finished
2023.04.19 16:30:30.379911 [ 24959 ] {} <Information> RaftInstance: no hearing further news from leader, remove this server from cluster and step down
2023.04.19 16:30:35.041064 [ 24966 ] {} <Information> KeeperDispatcher: Server still not initialized, will not apply configuration until initialization finished
2023.04.19 16:30:40.041253 [ 24966 ] {} <Information> KeeperDispatcher: Server still not initialized, will not apply configuration until initialization finished

----- from the leader (CH3) ----- similar messages in CH4 and CH5
2023.04.19 16:30:26.627879 [ 1597 ] {} <Information> RaftInstance: receive a incoming rpc connection
2023.04.19 16:30:26.627983 [ 1597 ] {} <Information> RaftInstance: session 3 got connection from ::ffff:CH6:38734 (as a server)
2023.04.19 16:30:26.628584 [ 1602 ] {} <Information> RaftInstance: [PRE-VOTE REQ] my role leader, from peer 6, log term: req 0 / mine 1
last idx: req 0 / mine 153, term: req 0 / mine 1
HB alive
2023.04.19 16:30:26.628692 [ 1602 ] {} <Information> RaftInstance: pre-vote decision: XX (strong deny, non-existing node)
2023.04.19 16:30:30.383029 [ 1599 ] {} <Error> RaftInstance: session 3 failed to read rpc header from socket ::ffff:CH6:38734 due to error 2, End of file, ref count 1

Solution

  • It's really hard to say what went wrong without configs.

    First thing I would like for you to try is changing config while the cluster is running. Maybe the diff is not calculated correctly during startup. Or simply, you have to wait for a bit until cluster applies everything correctly.

    To give you better understanding of Keeper, I'll also try explaining how the cluster reconfiguration is currently being done.

    Cluster configuration is being stored in 2 places, RAFT itself and the config you define.

    In the background thread, Keeper checks the diff between those 2, and if a config has something added/removed, leader will propose it 1 by 1 to cluster.

    Adding/removing node needs to go through consensus protocol also, so maybe there will be some delay but not too big.

    In the leader logs, you should something like this: Will try to add server with id server_id

    If something went wrong, there should be also a log explaining it.

    Follower nodes will have a similar log but with a message that they are waiting for some server to be added/removed.

    Additional thing to be careful off is to use unique IDs for new server. Also, try not reusing some previously used IDs that are now removed.