We have a 10 node Cassandra cluster. We configured a repair in Opscenter. We find there is a backups folder created for every table in Opscenter keyspace. It keeps growing huge. Is there a solution to this, or do we manually delete the data in each backups folder?
First off, Backups are different from snapshots - you can take a look at the backup documentation for OpsCenter to learn more.
From the datastax docs -
When incremental backups are enabled (disabled by default), Cassandra hard-links each flushed SSTable to a backups directory under the keyspace data directory. This allows storing backups offsite without transferring entire snapshots. Also, incremental backups combine with snapshots to provide a dependable, up-to-date backup mechanism. ... As with snapshots, Cassandra does not automatically clear incremental backup files. DataStax recommends setting up a process to clear incremental backup hard-links each time a new snapshot is created.
You must have turned on incremental backups by setting incremental_backups to true in cassandra yaml.
If you are interested in a backup strategy, I recommend you use the OpsCenter Backup Service instead. That way, you're able to control granularly which keyspace you want to back up and push your files to S3.
Snapshots are hardlinks to old (no longer used) SSTables. Snapshots protect you from yourself. For example you accidentally truncate the wrong keyspace, you'll still have a snapshot for that table that you can bring back. There are some cases when you have too many snapshots, there's a couple of things you can do:
This is related to repairs because synchronous repairs generate a Snapshot each time they run. In order to avoid this, you should run parallel repairs instead (-par flag or by setting the number of repairs in the opscenter config file note below)
If you have too many snapshots and need to free up space (maybe once you have backed them up to S3 or glacier or something) go ahead and use nodetool clearsnapshots to delete them. This will free up space. You can also go in and remove them manually from your file system but nodetool clearsnapshots removes the risk of rm -rf ing the wrong thing.
Note: You may also be running repairs too fast if you don't have a ton of data (check my response to this other SO question for an explanation and the repair service config levers).