Search code examples
replicationcockroachdbraft

Having replication factor more than number of nodes in cockroachdb


I have setup a three node insecure cluster for testing in local machine. I created a database and added a table with few records. I queried the Zone configuration and it showed that num_replicas is 3 and range has replicas as {1, 2, 3}

root@:26257/foo> show zone configuration for database foo;
     target     |              raw_config_sql
----------------+-------------------------------------------
  RANGE default | ALTER RANGE default CONFIGURE ZONE USING
                |     range_min_bytes = 134217728,
                |     range_max_bytes = 536870912,
                |     gc.ttlseconds = 90000,
                |     num_replicas = 3,
                |     constraints = '[]',
                |     lease_preferences = '[]'
(1 row)

Time: 2ms total (execution 2ms / network 0ms)

root@:26257/foo> show ranges from database foo;
  table_name | start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
-------------+-----------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  bar        | NULL      | NULL    |       36 |      0.000105 |            2 |                       | {1,2,3}  | {"","",""}
(1 row)

Then I altered the num_replicas to 5 with below query. Now the number of replicas is more than number nodes available in cluster and I didn't get any error.

 root@:26257/foo> ALTER RANGE default CONFIGURE ZONE USING num_replicas = 5, gc.ttlseconds = 100000;
 CONFIGURE ZONE 1

 Time: 174ms total (execution 174ms / network 0ms)
 root@:26257/foo> show zone configuration for database foo;
     target     |              raw_config_sql
----------------+-------------------------------------------
  RANGE default | ALTER RANGE default CONFIGURE ZONE USING
                |     range_min_bytes = 134217728,
                |     range_max_bytes = 536870912,
                |     gc.ttlseconds = 100000,
                |     num_replicas = 5,
                |     constraints = '[]',
                |     lease_preferences = '[]'
(1 row)

Then I added a node to the cluster and expected the repliacs for the range to grow. It didn't get replicated but got rebalanced to additional node {1, 2, 4}.

cockroach node ls --insecure
  id
------
   1
   2
   3
   4
   

From SQl console

root@:26257/foo> show ranges from database foo;
  table_name | start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
-------------+-----------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  bar        | NULL      | NULL    |       36 |      0.000105 |            2 |                       | {1,2,4}  | {"","",""}
(1 row)

As per document, replicas column should list the nodes with replicas for this range. With num_replicas set as 5, shouldn't this column show all 4 nodes? Did I get anything wrong in my understanding or queries?


Solution

  • Although it's not clear in the CockroachDB docs, a cluster will apply a replication factor only once there are a matching number of nodes or more. So after setting the replication factor to 5 for database foo and adding a fourth node, the cluster might rebalance replicas to that new node if it makes sense, but it won't increase the number of replicas to 5 until there's a fifth node.