Search code examples
javapostgresqlhibernatejdbc

How can I let Hibernate merge consecutive INSERT statements into multi-valued INSERT


I have some JPA entity that looks like this:

@Entity
@Table(name = "timeseries_data")
public class TimeseriesDataEntry implements Serializable {
    @Id
    @Column(name = "timeseries_id", nullable = false)
    private String timeseriesID;

    @Id
    @Column(name = "data_timestamp", nullable = false)
    private Instant timestamp;

    @Column(name = "data_value")
    private BigDecimal value;

    // constructor/setter/getter boilerplate etc.
}

I then want to instantiate a large number of these entities and persist them somewhat efficiently. Currently I do this in a very simple way:

List<TimeseriesDataEntry> entries = new ArrayList<>();
Instant timestamp = Instant.EPOCH;
for (int i = 1; i <= 100; i++) {
    entries.add(new TimeseriesDataEntry("some-ts-id", timestamp, BigDecimal.valueOf(i)));
    timestamp = timestamp.plusHours(1);
}
entries.forEach(entityManager::persist);
entityManager.flush();

Having set the property hibernate.show_sql to true, I can see that hibernate renders these statements:

Hibernate: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values (?,?,?)
Hibernate: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values (?,?,?)
Hibernate: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values (?,?,?)
... (100 times total)

I am executing this against a PostgreSQL database with advanced statement logging turned on:

docker run -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:14.2 -c log_statement=all

In the PostgreSQL logs I can see that the INSERT statements are executed separately:

LOG:  execute S_11: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values ($1,$2,$3)
DETAIL:  parameters: $1 = '1', $2 = 'some-ts-id', $3 = '1970-01-01 00:00:00'
LOG:  execute S_11: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values ($1,$2,$3)
DETAIL:  parameters: $1 = '2', $2 = 'some-ts-id', $3 = '1970-01-01 01:00:00'
LOG:  execute S_11: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values ($1,$2,$3)
DETAIL:  parameters: $1 = '3', $2 = 'some-ts-id', $3 = '1970-01-01 02:00:00'
... (100 times total)

If I enable the PostgreSQL JDBC setting reWriteBatchedInserts by providing my JDBC connection string as follows:

jdbc:postgresql://localhost:5432/postgres?reWriteBatchedInserts=true

Hibernate still logs the INSERT statements separately, but I can now observe that they have been merged on the database side of things:

LOG:  execute S_10: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values ($1,$2,$3),($4,$5,$6),($7,$8,$9),...(up to $384)
DETAIL:  parameters: $1 = '1', $2 = 'some-ts-id', $3 = '1970-01-01 00:00:00', $4 = '2', $5 = 'some-ts-id', $6 = '1970-01-01 01:00:00', $7 = '3', $8 = 'some-ts-id', $9 = '1970-01-01 02:00:00', ...

Now my question is: how do I tell Hibernate to do this transformation? I do not want to rely on database-specific JDBC settings.

This post on a different Stack Overflow question suggests that Hibernate does this automatically once I set hibernate.jdbc.batch_size to some value > 1. I have set it to 100.

Also, Hibernate's Dialect interface has a method supportsValuesListForInsert defaulting to true that is not overridden in the PostgreSQLDialect, and a more general method supportsValuesList that is overridden to return true in the PostgreSQLDialect.

Unfortunately, I was unable to find any official documentation on such INSERT statement merging, but it looks like this feature does exist and I just haven't quite managed to get it to work.


Solution

  • That's right. When you set batch_size > 1, hibernate send multiple statement in the same time and not merging them into multi-valued INSERT.

    In the other post on a different Stack Overflow question, in the comment of the answers, some people reminds him that without using reWriteBatchedInserts inserts will not be merged.

    Yes Hibernate know if Dialect support list for insert. Unfortunately, Hibernate not use this information on batch insert (I don't know why for the moment).

    Hibernate only uses this information if you are making an insert with value list and the dialect not support this. Then Hibernate will split this insert into multiple : https://github.com/hibernate/hibernate-orm/blob/main/hibernate-core/src/main/java/org/hibernate/query/sqm/internal/QuerySqmImpl.java#L766

    Maybe this can be a feature request. I didn't know that Dialect had this information and it's true that it would be interesting to use it.