Search code examples
amazon-web-servicesamazon-s3cassandradisk

Backup cassandra to another disk


I'm trying to backup my cassandra cluster to AWS' S3, and found this tool, which seems to do the work:
https://github.com/tbarbugli/cassandra_snapshotter/
But the problem is, in our current cluster, we cant afford to have snapshots on the same disk as the actual data, for we are using SSD's with limited space.
I've also looked up the nodetool snapshot documentation, but I didn't find any option to change the snapshots dir.
So, how can I backup cassandra to another disk, without using the data disk?


Solution

  • Cassandra snapshots are just hard links to all the live sstables at the moment you take the snapshot. So initially they don't take up any additional space on disk. As time passes the new live sstables will supersede the old one at which point your snapshots will start to count against your storage space.

    Generally you will take a snapshot to get a consistent view of the database at a given point in time and then use an external tool or script to copy that backup to external storage (and finally clean up the snapshot).

    There is no additional tool provided with Cassandra to handle copying the snapshots to external storage. This isn't too surprising as backup strategies very a lot across companies.