How to perform efficient group by / sum aggregation on large tables in hsqldb?

In my HSQL DB I have a large (>10M rows) cached table 'Products' where each Product has a 'groupId' and 'value' column. There is also an index on the groupId colum. In another cached table 'Groups' (also large, >1M rows) I need to store all aggregated values of all Products that belong to this group. So basically my query would look like this:

MERGE INTO Groups AS g USING (
    SELECT groupId, SUM(value) AS aggrValue
    FROM Products GROUP BY groupId
) AS sub
ON g.groupId = sub.groupId
WHEN MATCHED THEN UPDATE SET g.value = sub.aggrValue

However, this does not work for large tables, because as stated in the documentation under 'Temporary Memory Use During Operations', all rows affected by the update will be held in memory during the operation. I also tried to first write the aggregated values into a helper table with this query:

INSERT INTO HelperTable (groupId, value)
SELECT groupId, SUM(value)
FROM Products
GROUP BY groupId

but this also exceeds the available memory. I have already tried limiting the hsqldb.result_max_memory_rows setting to 100_000. What would be the preferred solution to aggregate the values and write them into the Groups table ? My main requirement is that the operation runs with constant memory regardless of the number of Products and Groups.

Solution

By design, HSQLDB performs some operations such as GROUP BY entirely in memory. It also stores the transaction history in memory until commit.

You can perform the operation in chunks, based on the groupId column. You need to commit after each chunk.

You can develop the update as a delta update. When new products are added, the sums of values for these product are added to the stored sums. Or when products are removed, the sums of values are subtracted from the stored sums.