Search code examples
javasqljdbcjooqclob

stream contents of 1GB file to sqlite table under single column


Below implementation gives Out of Memory Error for files of 1 GB size with 4 GB heap space. Files.lines will return a stream but while running Collectors.joining it gives Heap Error.

Can we stream a file with lesser memory footprint using jooq and jdbc preserving original line separators ?

Stream<String> lines = Files.lines(path);

dsl.createTable(TABLE1)
            .column(COL1, SQLDataType.CLOB)
            .column(COL2, SQLDataType.CLOB)
            .execute();

dsl.insertInto(TABLE1)
                .columns(COL1, COL2)
                .values(VAL1, lines
                        .collect(Collectors.joining(System.lineSeparator())))
                .execute();

Error ->

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_141]
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) ~[na:1.8.0_141]
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) ~[na:1.8.0_141]
at java.lang.StringBuilder.append(StringBuilder.java:136) ~[na:1.8.0_141]
at java.lang.StringBuilder.append(StringBuilder.java:76) ~[na:1.8.0_141]
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:484) ~[na:1.8.0_141]
at java.lang.StringBuilder.append(StringBuilder.java:166) ~[na:1.8.0_141]
at java.util.StringJoiner.add(StringJoiner.java:185) ~[na:1.8.0_141]
at java.util.stream.Collectors$$Lambda$491/1699129225.accept(Unknown Source) ~[na:na]
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[na:1.8.0_141]
at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_141]
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[na:1.8.0_141]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_141]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_141]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_141]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_141]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[na:1.8.0_141]

Solution

  • jOOQ's default data type binding for CLOB data type treats the CLOB type as an ordinary String, which works well for small to mid sized lobs. For larger lobs, the streaming version of the JDBC API would be more suitable. Ideally, you would create your own data type binding, where you optimise your write operation for streaming: https://www.jooq.org/doc/latest/manual/code-generation/custom-data-type-bindings

    For example:

    class StreamingLobBinding implements Binding<String, File> {
        ...
        @Override
        public void set(BindingSetStatementContext<File> ctx) {
            // Ideally: register the input stream somewhere for explicit resource management
            ctx.statement()
               .setBinaryStream(ctx.index(), new FileInputStream(ctx.value()));
        }
    }
    

    And then, you can either apply this binding to your column and have the code generator pick it up as documented in the above link, or you apply it only for single usage:

    DataType<File> fileType = COL2.getDataType().asConvertedDataType(new StreamingLobBinding());
    Field<File> fileCol = DSL.field(COL2.getName(), fileType);
    dsl.insertInto(TABLE1)
       .columns(COL1, fileCol)
       .values(VAL1, DSL.val(theFile, fileType))
       .execute();
    

    Note that currently, you may need to register your input stream in some ThreadLocal to remember it and clean it up after statement execution. A future jOOQ version will offer SPI for handling that: https://github.com/jOOQ/jOOQ/issues/7591