Below implementation gives Out of Memory Error for files of 1 GB size with 4 GB heap space. Files.lines will return a stream but while running Collectors.joining it gives Heap Error.
Can we stream a file with lesser memory footprint using jooq and jdbc preserving original line separators ?
Stream<String> lines = Files.lines(path);
dsl.createTable(TABLE1)
.column(COL1, SQLDataType.CLOB)
.column(COL2, SQLDataType.CLOB)
.execute();
dsl.insertInto(TABLE1)
.columns(COL1, COL2)
.values(VAL1, lines
.collect(Collectors.joining(System.lineSeparator())))
.execute();
Error ->
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_141]
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) ~[na:1.8.0_141]
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) ~[na:1.8.0_141]
at java.lang.StringBuilder.append(StringBuilder.java:136) ~[na:1.8.0_141]
at java.lang.StringBuilder.append(StringBuilder.java:76) ~[na:1.8.0_141]
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:484) ~[na:1.8.0_141]
at java.lang.StringBuilder.append(StringBuilder.java:166) ~[na:1.8.0_141]
at java.util.StringJoiner.add(StringJoiner.java:185) ~[na:1.8.0_141]
at java.util.stream.Collectors$$Lambda$491/1699129225.accept(Unknown Source) ~[na:na]
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[na:1.8.0_141]
at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_141]
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[na:1.8.0_141]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_141]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_141]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_141]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_141]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[na:1.8.0_141]
jOOQ's default data type binding for CLOB
data type treats the CLOB
type as an ordinary String
, which works well for small to mid sized lobs. For larger lobs, the streaming version of the JDBC API would be more suitable. Ideally, you would create your own data type binding, where you optimise your write operation for streaming:
https://www.jooq.org/doc/latest/manual/code-generation/custom-data-type-bindings
For example:
class StreamingLobBinding implements Binding<String, File> {
...
@Override
public void set(BindingSetStatementContext<File> ctx) {
// Ideally: register the input stream somewhere for explicit resource management
ctx.statement()
.setBinaryStream(ctx.index(), new FileInputStream(ctx.value()));
}
}
And then, you can either apply this binding to your column and have the code generator pick it up as documented in the above link, or you apply it only for single usage:
DataType<File> fileType = COL2.getDataType().asConvertedDataType(new StreamingLobBinding());
Field<File> fileCol = DSL.field(COL2.getName(), fileType);
dsl.insertInto(TABLE1)
.columns(COL1, fileCol)
.values(VAL1, DSL.val(theFile, fileType))
.execute();
Note that currently, you may need to register your input stream in some ThreadLocal
to remember it and clean it up after statement execution. A future jOOQ version will offer SPI for handling that: https://github.com/jOOQ/jOOQ/issues/7591