I am new to MemSQL. I have created a database and tables in MemSQL on cluster with 5 leaf nodes and 2 aggregator nodes. Spark is running on the same cluster. Everything is in default mode. Inserted data and deleted the same. select * does not return anything. But when I saw web clusterUI, each leaf nodes are still consuming around 6TB of disk space.
Disk Capacity description says "This is is amount of disk space used by MemSQL relative to total disk space available. When this is full, no snapshot, transaction logs or columnstore data can be created".
With this description, I assume that above 6TB disk space is because of MemSQL usage.
Can some please clarify
The 6TB usage might be because of MemSQL, or it might be because of Spark or some other process. MemSQL ops reports total disk usage, not disk used by MemSQL (the tooltip is slightly misleading).
1) Rowstore tables (tables WITHOUT a CLUSTERD COLUMNSTORE index) write logs to disk for every write. The logs are combined into snapshots when the logs become to large, and by default, we keep the last two snapshot files. Thus, it is possible that the older of the two snapshots contains the data you deleted. You can trigger a new snapshot with SNAPSHOT <dbName>
, and this will let GC cleanup the old (possibly large) ones.
2) Snapshots and logs are per database, not per table. Dropping a table will not trigger snapshot/log cleanup, but dropping the database or triggering a new snapshot will.
3) You probably shouldn't delete data directories by hand. DROP DATABASE <db_name>
will delete all data associated with that database.
For columnstore tables, the story is slightly different, but I assume "Everything is default" means no columnstore tables.