stream contents of 1GB file to sqlite table under single column

Below implementation gives Out of Memory Error for files of 1 GB size with 4 GB heap space. Files.lines will return a stream but while running Collectors.joining it gives Heap Error.

Can we stream a file with lesser memory footprint using jooq and jdbc preserving original line separators ?

Stream<String> lines = Files.lines(path);

dsl.createTable(TABLE1)
            .column(COL1, SQLDataType.CLOB)
            .column(COL2, SQLDataType.CLOB)
            .execute();

dsl.insertInto(TABLE1)
                .columns(COL1, COL2)
                .values(VAL1, lines
                        .collect(Collectors.joining(System.lineSeparator())))
                .execute();

Error ->

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_141]
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) ~[na:1.8.0_141]
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) ~[na:1.8.0_141]
at java.lang.StringBuilder.append(StringBuilder.java:136) ~[na:1.8.0_141]
at java.lang.StringBuilder.append(StringBuilder.java:76) ~[na:1.8.0_141]
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:484) ~[na:1.8.0_141]
at java.lang.StringBuilder.append(StringBuilder.java:166) ~[na:1.8.0_141]
at java.util.StringJoiner.add(StringJoiner.java:185) ~[na:1.8.0_141]
at java.util.stream.Collectors$$Lambda$491/1699129225.accept(Unknown Source) ~[na:na]
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[na:1.8.0_141]
at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_141]
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[na:1.8.0_141]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_141]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_141]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_141]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_141]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[na:1.8.0_141]

Solution

jOOQ's default data type binding for CLOB data type treats the CLOB type as an ordinary String, which works well for small to mid sized lobs. For larger lobs, the streaming version of the JDBC API would be more suitable. Ideally, you would create your own data type binding, where you optimise your write operation for streaming: https://www.jooq.org/doc/latest/manual/code-generation/custom-data-type-bindings

For example:

class StreamingLobBinding implements Binding<String, File> {
    ...
    @Override
    public void set(BindingSetStatementContext<File> ctx) {
        // Ideally: register the input stream somewhere for explicit resource management
        ctx.statement()
           .setBinaryStream(ctx.index(), new FileInputStream(ctx.value()));
    }
}

And then, you can either apply this binding to your column and have the code generator pick it up as documented in the above link, or you apply it only for single usage:

DataType<File> fileType = COL2.getDataType().asConvertedDataType(new StreamingLobBinding());
Field<File> fileCol = DSL.field(COL2.getName(), fileType);
dsl.insertInto(TABLE1)
   .columns(COL1, fileCol)
   .values(VAL1, DSL.val(theFile, fileType))
   .execute();

Note that currently, you may need to register your input stream in some ThreadLocal to remember it and clean it up after statement execution. A future jOOQ version will offer SPI for handling that: https://github.com/jOOQ/jOOQ/issues/7591