Search code examples
hsqldb

Why does running an update query on an embedded HSQLDB consume a lot of memory?


I am using HSQLDB 2.4.1 (embedded with cached tables). I have a large database (with ~21 million rows. DB Size is 5GB) I'm trying to run the following query:

UPDATE TABLE_NAME SET COLUMN1=0

I tried changing and playing around with theses properties but eventually this update statement is consuming huge amount of ram, as if it is copying the entire database into memory.

    properties.setProperty("hsqldb.large_data" , "true");
    properties.setProperty("hsqldb.log_data" , "false");
    properties.setProperty("hsqldb.default_table_type" , "cached");
    properties.setProperty("hsqldb.result_max_memory_rows" , "50000");
    properties.setProperty("hsqldb.tx" , "mvcc");
    properties.setProperty("sql.enforce_tdc_update" , "false");
    properties.setProperty("shutdown" , "true");
    properties.setProperty("runtime.gc_interval" , "100000");

When I execute this query in DBeaver, I notice that the memory consumption increases significantly and keeps increasing until it reaches the maxmem of 4GB at which point the application crashes with an out of memory error.

PS: Running this exact query on an identical embedded Derby database takes around 5 minutes, but eventually returns and memory usage in DBeaver remains constant at around ~400mb.


Solution

  • All the updated rows are loaded into memory until commit. You can update in chunks based on primary key and a LIMIT clause. For example:

     UPDATE TABLE_NAME SET COLUMN1= 0 WHERE COLUMN1 != 0 AND (PK_ID > 1000000 AND PK_ID < 2000000) LIMIT 1000000
    

    The above statement shows two different techniques to limit the rows. First, by using an indexed column to limit; Second by using LIMIT to stop updating after a number of rows have been updated.

    http://hsqldb.org/doc/2.0/guide/dataaccess-chapt.html#dac_update_statement