Search code examples
cassandracqlcql3cqlsh

Why does nodetool status *keyspace* still show hundreds of MBs of data after TRUNCATE?


I have used the TRUNCATE command from the CQLSH at node .20 for my table.

20 Minutes have passed since I issued the command and the output of nodetool status *myKeyspace* still shows a lot of data on 4 out of 6 nodes.

I am using Cassandra 3.0.8

192.168.178.20:/usr/share/cassandra$ nodetool status *myKeyspace*
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.178.24  324,57 MB  256          32,7%             4d852aea-65c7-42e1-b2bd-f38a320ec827  rack1
UN  192.168.178.28  650,86 KB  256          35,7%             82b67dc5-9f4f-47e9-81d7-a93f28a3e9da  rack1
UN  192.168.178.30  155,68 MB  256          31,9%             28cf5138-7b61-42ca-8b0c-e4be1b5418ba  rack1
UN  192.168.178.32  321,62 MB  256          33,3%             64e106ed-770f-4654-936d-db5b80aa37dc  rack1
UN  192.168.178.36  640,91 KB  256          33,0%             76152b07-caa6-4214-8239-e8a51bbc4b62  rack1
UN  192.168.178.20  103,07 MB  256          33,3%             539a6333-c4ef-487a-b1e4-aac40949af4c  rack1

The following command was run on .24 node. It looks like there there are still snapshots/backups being saved somewhere? But the amount of data, 658 MB for Node .24, does not match the reported 324 MB from nodetool status. What's going on there?

192.168.178.24:/usr/share/cassandra$ nodetool cfstats *myKeyspace*
Keyspace: *myKeyspace*
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Flushes: 0
                Table: data
                SSTable count: 0
                Space used (live): 0
                Space used (total): 0
                Space used by snapshots (total): 658570012
                Off heap memory used (total): 0
                SSTable Compression Ratio: 0.0
                Number of keys (estimate): 0
                Memtable cell count: 0
                Memtable data size: 0
                Memtable off heap memory used: 0
                Memtable switch count: 0
                Local read count: 0
                Local read latency: NaN ms
                Local write count: 0
                Local write latency: NaN ms
                Pending flushes: 0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0,00000
                Bloom filter space used: 0
                Bloom filter off heap memory used: 0
                Index summary off heap memory used: 0
                Compression metadata off heap memory used: 0
                Compacted partition minimum bytes: 0
                Compacted partition maximum bytes: 0
                Compacted partition mean bytes: 0
                Average live cells per slice (last five minutes): 3.790273556231003
                Maximum live cells per slice (last five minutes): 103
                Average tombstones per slice (last five minutes): 1.0
                Maximum tombstones per slice (last five minutes): 1

Note that there are no other tables than the one I cleaned in the keyspace. There might be some index data from cassandra-lucene-index though if they do not get cleared when using TRUNCATE.


Solution

  • nodetool status's keyspace option is really only for knowing the replication factor and datacenters to include when computing the ownership. The load is actually for all the sstables, not just the one keyspace. Just like how IP address, host id, and number of tokens is not affected by setting keyspace option. status is more of a global check.

    Space used by snapshots is expected to still have old data. When you do a truncate it snapshots the data (can disable by setting auto_snapshot in cassandra.yaml to false). To clear all the snapshots you can use nodetool clearsnapshot <keyspace>