Why shouldn't you run nodetool removenode?

I am wondering why it is a best practice to not run nodetool removenode. What is it used for? Is there a hierarchy of commands to run instead? What kind of issues arise when running said command? Any first hand experience/nightmare stories of using removenode? Overarching why not?

Solution

A default preference order would be:

Replacement node option (if replacement is planned)
Decomission
RemoveNode
Assassinate

However - there are situations where you will still choose a lower entry over an earlier one.

If the node being removed is operational, then you would normally run a decommission and allow the node to stream the data from itself to other nodes which will now be holding one of the replicas that was previously on the node being removed.

Removing a node will cause the token ranges to be recalculated and move, potentially requiring all nodes to start streaming data to other nodes who now own that range.

If the node is not operational, you can perform a nodetool removenode - this will trigger the same range movement and cause a large amount of streaming. There are streaming throughput throttles that are in place by default and can be adjusted to limit this impact.

You can also forcibly terminate either a decommission or a removenode by using nodetool [decommission | removenode] force - however, this means that one of the replicas of the data has not been re-established to another node, leaving you with less resilience.

Why would you do that? for the same streaming reason, if you accept the loss of resilience for a period of time, you can roll out the repair node by node in a controlled manner. This option should not be considered your 'default approach' or the choice taken lightly - I cannot stress or make that in bold enough.

The final option, when decommission / removenode is not available, is to assassinate the node - this is pretty much the same as performing a removenode, followed by an immediate force. You then have to manage to repairs and clean up in the same manner.

Outside of all of these 3 options - the best option is if your intention is to replace the node, then performing a replacement instead of a remove / add is the winner - this will only then require the new node has data streamed to it from the other replica's and there is no further token ring range movement. Instructions here

If the data disks are available, it is also possible to bring the replacement up without streaming the data, instructions here