I'm working on a single Cassandra 3.11.2 node(RHEL 6.5). In keyspace(named 'test'), I've a table named 'test'. I entered some rows via cqlsh and then did nodetool flush. I checked in the data directory to confirm that a SSTable got created. Now I deleted all the .db files(from the test.test data directory using rm *.db). Strangely, I can still see all the rows in cqlsh! I don't understand, how this is happening since I manually deleted the SSTable.
Given below is my keyspace:
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
Given below is the table:
CREATE TABLE test.test (
aadhar_number int PRIMARY KEY,
address text,
name text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Given below is the output of nodetool tablestats command(after I had deleted the SSTable):
Keyspace : test
Read Count: 0
Read Latency: NaN ms
Write Count: 13
Write Latency: 0.11269230769230769 ms
Pending Flushes: 0
Table: test
SSTable count: 1
Space used (live): 5220
Space used (total): 5220
Space used by snapshots (total): 0
Off heap memory used (total): 48
SSTable Compression Ratio: 0.7974683544303798
Number of partitions (estimate): 255
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 4
Local read count: 0
Local read latency: NaN ms
Local write count: 10
Local write latency: NaN ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 24
Bloom filter off heap memory used: 16
Index summary off heap memory used: 16
Compression metadata off heap memory used: 16
Compacted partition minimum bytes: 18
Compacted partition maximum bytes: 50
Compacted partition mean bytes: 36
Average live cells per slice (last five minutes): 5.0
Maximum live cells per slice (last five minutes): 5
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0
I restarted Cassandra and only then the data stopped showing in cqlsh.
A very good article for understanding filesystem details in linux.
On linux, filenames are just pointers (inodes) that point to the memory where the file resides. When Cassandra open the files, it holds a link to it. When you use rm to remove the file, you delete the link from the filesystem to the physical memory, but the file is still referenced by a live process and is therefore not deleted. You can easily check that with the command lsof
(list open files). There is a flag to list for a given pid (check the cassandra pid with something like ps aux | grep cassandra
)
Obviously, when you restart Cassandra, the file get deleted.