Search code examples
cassandrabackuprestore

Can I just copy the snapshot from an existing node to a new machine to restore data?


I'm studying backup and restore of Cassandra. The DataStax doc (Cassandra restore) mentioned two ways of restoring data:

  1. Restoring from local nodes
  2. Restoring from centralized backups.

My question is: Can I just generate the snapshot of my existing node and copy the output files (file.db) to a newly started node. Then put them into the right directory /var/lib/cassandra/data. Is this a right way?

Or is there anything I am missing?


Solution

  • You cannot simply copy snapshots from one to another because those two nodes won't necessarily own the same token range(s) so the data in the snapshots would be useless to a random node.

    In Cassandra, data is distributed across nodes in a cluster, each owning a range of tokens. Each partition (a record) has a partition key (part of the primary key) which is converted into a token value using a hashing algorithm whose value determines which node the partition will be stored.

    The SSTables in the snapshot only contain partitions which have token values that lie within the range of tokens owned by a node. Copying the snapshot to a node which owns a different range of tokens means that some or all the partitions in the SSTables will be lost since the new node will never accept reads for partitions it doesn't own.

    If you want to restore data to a node with identical configuration (including token ownership), you can use the "refresh method". I've documented the detailed procedure in How to restore snapshots to a cluster with identical configuration.

    Otherwise, you will need to bulk-load the snapshots using the sstableloader utility so the data in the SSTables are sent to the nodes which own them (replicas). I've documented the detailed procedure in How to clone data to a new cluster. Cheers!