Search code examples
monetdb

MonetDB Full Disk How To Manually Free Space


My question is similar to this one, essentially I forgot a clause in a join when using MonetDB that produced an enormous result that filled the disk on my computer. Monetdb didn't clean up after this and despite freeing space and waiting 24 hours the disk is still much fuller than it should be.

See below the size of the database in monetdb (In GB):

sql>SELECT CAST(SUM(columnsize) / POWER(1024, 3) AS INT) columnSize FROM STORAGE();
+------------+
| columnsize |
+============+
|        851 |
+------------+
1 tuple

And the size of the farm on disk:

sudo du -hs ./*
3,2T    ./data_warehouse
5,5M    ./merovingian.log

The difference in size is unexplained and appeared suddenly after launching the query that generated an extremely large result.

I can track these files down into the merovingian.log file and the BAT directory inside warehouse where many large files named after integers and .tail or .theap can be found.:

sudo du -hs ./*
2,0T    ./data_warehouse
1,3T    ./merovingian.log
4,0K    ./merovingian.pid

My question is how can I manually free this disk space without corrupting the database? Can any of these files be safely deleted or is there a command that can be launched to get MonetDB to free this space?

So far I've tried the following with no effect:

  • Restarting the database
  • Installing the latest version of the database (last time this happened), my current version is: MonetDB Database Server Toolkit v11.37.11 (Jun2020-SP1)
  • Various VACUUM and FLUSH commands documented here, (Note that VACUUM doesn't run on my version)
  • Checking online and reading the mailing list

Many thanks in advance for any assistance.


Solution

  • Normally, during the query execution, MonetDB will free up memory/files that are no longer needed. But if that doesn't happen, you can try the following manual clean up.

    First, lock and stop the database (it's called warehouse?):

    monetdb lock warehouse
    monetdb stop warehouse
    

    You can fairly safely remove the merovingian.log to gain 1.3T (this log file can contain useful information for debugging, but in its current size, it's a bit difficult to use). The kill command is to tell monetdbd to start a new log file:

    rm /<path-to>/merovingian.log
    kill -HUP `pgrep monetdbd`
    

    Then restart the database:

    monetdb release warehouse
    monetdb start warehouse
    

    During the start-up, the MonetDB server should clean up the left-over transient data files from the previous session.


    Concerning the size difference between SUM(columnsize) and on-disk size:

    • there can be index files and string heap files. Their sizes are reported in separate columns returned by storage().
    • In your case, the database directory probably contains a lot of intermediate data files generated for the computation of your query.