Search code examples
cassandrarefreshnodetool

Why does nodetool refresh require a restart of Cassandra 2.2.8?


The documentation clearly states that nodetool refresh loads newly placed SSTables to the system without restart. Simple environment... 6 node cluster, 2 DCs, 3 nodes per DC, the keyspace in question is RF3 for each DC. Yes... I'm using C* 2.2.8, and there's nothing I can do about that at the moment. This never goes sideways on a single node cluster or a two node cluster [no network strategy in use] ... of course. :)

  1. Create a snapshot on a fully repaired, schema stable, and unchanging test keyspace. Hard links exist in the snapshot folder as expected.
  2. Add data to a column family and flush that data resulting in two sstables. Snapshot hardlinked to sstable1 and backups now have a hardlink to sstable2 created by the flush.
  3. Data is truncated via CQLSH, or on disk files are purged.
  4. The snapshot files are copied or hard linked to the data directory. Both approaches result in the same behavior. This results in only sstable1 in the data directory.
  5. All caches are invalidated... only using key caching. [or not]
  6. nodetool flush is used to prove that nothing is in memory. [or not]
  7. nodetool refresh is used to reload the newly placed sstable1.

At this point, the data queried from each node using local_one consistency shows sstable1 and sstable2 results for every node.

If a snapshot is attempted (or repair ... because of before repair snapshot), it fails trying to snapshot the non-existent sstable2.

After a restart of the cluster, all is well and as expected.

Is nodetool refresh buggy for C* 2.2.8 as compared to nodetool drain?

My start and stop C* process is as follows... start... systemctl start casssandra.service

stop... nodetool disablegossip nodetool disablethrift nodetool disablebinary nodetool drain nodetool stopdaemon <= expecting an error systemctl stop casssandra.service

.... of course a restart is a stop an start. :)

Thank you in advance.


Solution

  • I was using C* 2.2.8 from the Datastax repo. Now, I'm using Apache C* 2.2.11 from the Apache repo, and I can no longer reproduce this issue.

    Summary of the issue...

    1. create snapshot
    2. restore snapshot, nodetool refresh, no errors in the logs
    3. sstables are not replaced (phantom data [not a tombstone or commit log issue])
    4. subsequent snapshot fails because of missing sstable files (which are not in the snapshot created)
    5. after restarting C* the sstables are replaced and new snapshots no longer produce the error for any missing sstable files (which are not in the snapshot created)

    My conclusion is that C* 2.2.8 has a bug and 2.2.11 does not. Also, since Datastax no longer supports C*, avoid trusting them for anything that's not DSE. They are not to be trusted like Apache.