Search code examples
javajdbcprepared-statementhsqldb

Select 100+ millions of rows in HSQLDB


I have to iterate through a table with 100+ millions of records with JDBC on a HSQLDB database and I don't manage to do it in a reasonable time. I use hsqldb v2.4.0.

I tried to use a PreparedStatement with the following query to slice the data:

String select = "SELECT ID, NAME, VALUE FROM MY_TABLE ORDER BY ID OFFSET ? ROWS FETCH ? ROWS ONLY";

The problem is that it takes more and more time as we go through the table. Note that the ID column is indexed.

I tried to put a fetch size but it doesn't work either :

String select = "SELECT ID, NAME, VALUE FROM MY_TABLE";
PreparedStatement selectStatement = connection.prepareStatement(select, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
selectStatement.setFetchSize(5000);

And then I iterate through the ResultSet :

ResultSet result = selectStatement.executeQuery();
while (result.next()) {
    Long id = result.getLong(1);
    // do stuff ...
} 

HSQLDB still tries to fetch all the rows of the table and the returned ResultSetdoesn't fit in memory. Here is the stacktrace :

java.lang.OutOfMemoryError: Java heap space
at org.hsqldb.navigator.RowSetNavigatorData.ensureCapacity(Unknown Source)
at org.hsqldb.navigator.RowSetNavigatorData.add(Unknown Source)
at org.hsqldb.QuerySpecification.buildResult(Unknown Source)
at org.hsqldb.QuerySpecification.getSingleResult(Unknown Source)
at org.hsqldb.QuerySpecification.getResult(Unknown Source)
at org.hsqldb.StatementQuery.getResult(Unknown Source)
at org.hsqldb.StatementDMQL.execute(Unknown Source)
at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
at org.hsqldb.Session.execute(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.executeQuery(Unknown Source)
at com.jolbox.bonecp.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:174)
at myclass at the line ResultSet result = selectStatement.executeQuery();

Is there a way to achieve this in HSQLDB?


Solution

  • This issue is not related to memory use, as not too much memory is used for this type of SELECT.

    The increase in time to select the result is expected. The OFFSET clause in the SELECT indicates how many rows are skipped. As it gets larger, more rows are selected and skipped.

    You need to modify your SELECT to:

    SELECT ID, NAME, VALUE FROM MY_TABLE WHERE ID > ? ORDER BY ID FETCH ? ROWS ONLY
    

    And you process the result like this, using a running lastID with a PreparedStatement.

    long lastID = -1;
    
    // repeat the rest of the code until the result is empty
    selectStatement.setLong(1, lastID);
    selectStatement.setInt(2, 100000);
    
    ResultSet result = selectStatement.executeQuery();
    while (result.next()) {
     Long id = result.getLong(1);
     lastID = id;
     // do stuff ...
    }
    

    And