In my HSQL DB I have a large (>10M rows) cached table 'Products' where each Product has a 'groupId' and 'value' column. There is also an index on the groupId colum. In another cached table 'Groups' (also large, >1M rows) I need to store all aggregated values of all Products that belong to this group. So basically my query would look like this:
MERGE INTO Groups AS g USING (
SELECT groupId, SUM(value) AS aggrValue
FROM Products GROUP BY groupId
) AS sub
ON g.groupId = sub.groupId
WHEN MATCHED THEN UPDATE SET g.value = sub.aggrValue
However, this does not work for large tables, because as stated in the documentation under 'Temporary Memory Use During Operations', all rows affected by the update will be held in memory during the operation. I also tried to first write the aggregated values into a helper table with this query:
INSERT INTO HelperTable (groupId, value)
SELECT groupId, SUM(value)
FROM Products
GROUP BY groupId
but this also exceeds the available memory. I have already tried limiting the
hsqldb.result_max_memory_rows
setting to 100_000. What would be the preferred solution to aggregate the values and write them into the Groups table ? My main requirement is that the operation runs with constant memory regardless of the number of Products and Groups.
By design, HSQLDB performs some operations such as GROUP BY entirely in memory. It also stores the transaction history in memory until commit.
You can perform the operation in chunks, based on the groupId
column. You need to commit after each chunk.
You can develop the update as a delta update. When new products are added, the sums of values for these product are added to the stored sums. Or when products are removed, the sums of values are subtracted from the stored sums.