Search code examples
autoscalingcraterancher

autoscale a crate cluster


I'm playing around deploying crate in a Rancher environment.

It's working fine, but I have issues with two config params:

gateway.expected_nodes and gateway.recover_after_nodes.

What is best practice regarding these two when it comes to scaling crate up and down.

/hw


Solution

  • The settings gateway.expected_nodes and gateway.recover_after_nodes are only relevant during node startup.

    • scale-down: After you've removed some nodes you should update the configuration to reflect the new number of nodes in the cluster. But you don't need to restart.

    • scale-up: You should change the settings to the number of nodes you're going to have. This should be done before you start those new nodes. But you don't need to restart the existing nodes.

    For a running node/cluster these values don't have any effect at all, that's why you don't necessarily have to restart (but the values should be correct in case you do restart them). They're only relevant during start-up. They control if the node (that is just starting) should recover the data from it's filesystem or if it should wait for other nodes in the cluster and receive the data from them.

    For example given the case that you've 2 nodes: N1 and N2.

    • You create a table
    • You stop N2
    • You delete the table (on N1)
    • You start N2
    • N2 reads the gateway settings - it's wrong so it thinks it's going to be the only node in the cluster and recovers the table because it doesn't know that it got deleted on N1 (it doesn't know about N1 yet)
    • N2 eventually joins N1
    • The table is back in the cluster

    update

    should I care about warning in admin when all nodes being started or restarted will have correct settings

    If they will have the correct settings on a (re)start the warnings can be ignored.