I have some JPA entity that looks like this:
@Entity
@Table(name = "timeseries_data")
public class TimeseriesDataEntry implements Serializable {
@Id
@Column(name = "timeseries_id", nullable = false)
private String timeseriesID;
@Id
@Column(name = "data_timestamp", nullable = false)
private Instant timestamp;
@Column(name = "data_value")
private BigDecimal value;
// constructor/setter/getter boilerplate etc.
}
I then want to instantiate a large number of these entities and persist them somewhat efficiently. Currently I do this in a very simple way:
List<TimeseriesDataEntry> entries = new ArrayList<>();
Instant timestamp = Instant.EPOCH;
for (int i = 1; i <= 100; i++) {
entries.add(new TimeseriesDataEntry("some-ts-id", timestamp, BigDecimal.valueOf(i)));
timestamp = timestamp.plusHours(1);
}
entries.forEach(entityManager::persist);
entityManager.flush();
Having set the property hibernate.show_sql
to true
, I can see that hibernate renders these statements:
Hibernate: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values (?,?,?)
Hibernate: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values (?,?,?)
Hibernate: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values (?,?,?)
... (100 times total)
I am executing this against a PostgreSQL database with advanced statement logging turned on:
docker run -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:14.2 -c log_statement=all
In the PostgreSQL logs I can see that the INSERT statements are executed separately:
LOG: execute S_11: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values ($1,$2,$3)
DETAIL: parameters: $1 = '1', $2 = 'some-ts-id', $3 = '1970-01-01 00:00:00'
LOG: execute S_11: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values ($1,$2,$3)
DETAIL: parameters: $1 = '2', $2 = 'some-ts-id', $3 = '1970-01-01 01:00:00'
LOG: execute S_11: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values ($1,$2,$3)
DETAIL: parameters: $1 = '3', $2 = 'some-ts-id', $3 = '1970-01-01 02:00:00'
... (100 times total)
If I enable the PostgreSQL JDBC setting reWriteBatchedInserts
by providing my JDBC connection string as follows:
jdbc:postgresql://localhost:5432/postgres?reWriteBatchedInserts=true
Hibernate still logs the INSERT statements separately, but I can now observe that they have been merged on the database side of things:
LOG: execute S_10: insert into timeseries_data (data_value,timeseries_id,data_timestamp) values ($1,$2,$3),($4,$5,$6),($7,$8,$9),...(up to $384)
DETAIL: parameters: $1 = '1', $2 = 'some-ts-id', $3 = '1970-01-01 00:00:00', $4 = '2', $5 = 'some-ts-id', $6 = '1970-01-01 01:00:00', $7 = '3', $8 = 'some-ts-id', $9 = '1970-01-01 02:00:00', ...
Now my question is: how do I tell Hibernate to do this transformation? I do not want to rely on database-specific JDBC settings.
This post on a different Stack Overflow question suggests that Hibernate does this automatically once I set hibernate.jdbc.batch_size
to some value > 1. I have set it to 100
.
Also, Hibernate's Dialect
interface has a method supportsValuesListForInsert
defaulting to true
that is not overridden in the PostgreSQLDialect
, and a more general method supportsValuesList
that is overridden to return true
in the PostgreSQLDialect
.
Unfortunately, I was unable to find any official documentation on such INSERT statement merging, but it looks like this feature does exist and I just haven't quite managed to get it to work.
That's right. When you set batch_size > 1, hibernate send multiple statement in the same time and not merging them into multi-valued INSERT.
In the other post on a different Stack Overflow question, in the comment of the answers, some people reminds him that without using reWriteBatchedInserts inserts will not be merged.
Yes Hibernate know if Dialect support list for insert. Unfortunately, Hibernate not use this information on batch insert (I don't know why for the moment).
Hibernate only uses this information if you are making an insert with value list and the dialect not support this. Then Hibernate will split this insert into multiple : https://github.com/hibernate/hibernate-orm/blob/main/hibernate-core/src/main/java/org/hibernate/query/sqm/internal/QuerySqmImpl.java#L766
Maybe this can be a feature request. I didn't know that Dialect had this information and it's true that it would be interesting to use it.