Search code examples
singlestore

Memsql columnstore data not deleted from disk after TRUNCATE or DROP TABLE


I created a columnstore table in memsql and populated it with around 10 million records after which I started running several update scenarios. I noticed that the size of the data in /var/lib/memsql/leaf-3307/data/columns keeps increasing constantly and nothing there seems to be deleted. Initially the size of that folder is a couple hundred Mb but it quickly jumps to a couple of Gb after some full table updates. The "Columnstore Disk Usage" reported by memsql-ops also increases but at a very slow pace (far from what I see on disk).

This makes me think that data is never actually deleted from disk. The documentation states that running the OPTIMIZE commands should compact the row segment groups and that deleted rows would be removed:

Delete - Deleting a row in a columnstore index causes the row to be marked as deleted in the segment meta data leaving the data in place within the row segment. Segments which only contain deleted rows are removed, and the optimization process covered below will compact segments that require optimization.

Running the OPTIMIZE command didn't help. I also tried truncating the table and even dropping it but nothing helped. The data in the columns folder is still there. The only way I could find of cleaning that up is to DROP the entire database.

This doesn't seem like the desired behavior and I can't find any documentation justifying it. Can anybody explain why this is happening, if it should happen or point me to some relevant documentation?

Thanks in advance


Solution

  • MemSQL will keep around columnstore_window_size bytes of deleted columnstore data on disk per partition database. This is part of the implementation of columnstore replication (it keeps some old files around in case slaves are behind). If you lower the value of that system variable you'll see the disk usage drop. If your not using redundancy 2 there is no harm in lowering it.