We're developing a lightweight CRUD app and have opted to use the Serializable
isolation level for our transactions.
However, when ramping up the load on our environment we're seeing large numbers of failures for transactions that we didn't think would cause any problems. In particular, we have one transaction which we've managed to strip down to the following and still replicate problems:
transaction(Connection.TRANSACTION_SERIALIZABLE, 3) {
val record = MyRecord(UUID.randomUUID(), UUID.randomUUID(), DEFAULT_JSON)
myDao().getRecord(record.id)
myDao().addRecord(record)
}
which translates into SQL as:
SELECT mytable.id, mytable.userId, mytable.json, mytable.deleted_at
FROM mytable
WHERE mytable.id = '93ea4a65-cd52-4d73-ae74-38055c1b066b'
INSERT INTO mytable (deleted_at, json, id, user_id)
VALUES (NULL, '{"version":7}', '93ea4a65-cd52-4d73-ae74-38055c1b066b', '026d3c48-cdc5-4748-927b-408712e00f89')
I.e, a simple retrieve-then-insert via a PRIMARY KEY
UUID column. When ramping this up (e.g. 40 threads, each running 50 of the transactions in a row), we're seeing the vast majority of them fail with the following exception:
o.p.u.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions
Detail: Reason code: Canceled on identification as a pivot, during write.
Hint: The transaction might succeed if retried.at o.p.c.v.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
at o.p.c.v.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2178)
at o.p.c.v.QueryExecutorImpl.execute(QueryExecutorImpl.java:306)
at o.p.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at o.p.jdbc.PgStatement.execute(PgStatement.java:365)
at o.p.j.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155) at o.p.j.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:132) at o.j.e.s.s.InsertStatement.execInsertFunction(InsertStatement.kt:86) at o.j.e.s.s.InsertStatement.executeInternal(InsertStatement.kt:95) at o.j.e.s.s.InsertStatement.executeInternal(InsertStatement.kt:12) at o.j.e.s.s.Statement.executeIn$exposed(Statement.kt:59) ... 90 common frames omitted Wrapped by: o.j.e.e.ExposedSQLException: org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions Detail: Reason code: Canceled on identification as a pivot, during write. Hint: The transaction might succeed if retried. at o.j.e.s.s.Statement.executeIn$exposed(Statement.kt:61) at o.j.e.s.Transaction.exec(Transaction.kt:129) at o.j.e.s.Transaction.exec(Transaction.kt:123) at o.j.e.s.s.Statement.execute(Statement.kt:29) at o.j.e.sql.QueriesKt.insert(Queries.kt:44) at g.c.e.d.MyDao.insertEvent(DefaultEventsDao.kt:40) ... 81 common frames omitted
Digging into pg_locks
whilst the threads are running, we can see:
| locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
| page | 18496 | 17542 | 2 | <null> | <null> | <null> | <null> | <null> | <null> | 30/75 | 1467 | SIReadLock | True | False
| page | 18496 | 17542 | 5 | <null> | <null> | <null> | <null> | <null> | <null> | 34/45 | 1471 | SIReadLock | True | False
| page | 18496 | 17542 | 2 | <null> | <null> | <null> | <null> | <null> | <null> | 8/335 | 1446 | SIReadLock | True | False
| page | 18496 | 17542 | 1 | <null> | <null> | <null> | <null> | <null> | <null> | 31/65 | 1468 | SIReadLock | True | False
| page | 18496 | 17542 | 6 | <null> | <null> | <null> | <null> | <null> | <null> | 43/15 | 1480 | SIReadLock | True | False
| page | 18496 | 17542 | 4 | <null> | <null> | <null> | <null> | <null> | <null> | 5/357 | 1482 | SIReadLock | True | False
| page | 18496 | 17542 | 6 | <null> | <null> | <null> | <null> | <null> | <null> | 41/15 | 1478 | SIReadLock | True | False
| page | 18496 | 17542 | 6 | <null> | <null> | <null> | <null> | <null> | <null> | 40/30 | 1477 | SIReadLock | True | False
Where relation 17542
corresponds to the pkey of our table (verified by querying pg_class
). So it appears that the transactions are wanting a page lock for the SELECT
, and are therefore failing because other inserts into the same page are happening concurrently.
This story gains weight as we've noticed that as the table grows, the failure rate reproduced by our test decreases (the records are split over more pages and so less collisions occur).
My questions, then, are:
Serializable
isolation level? Lowering to repeatable read
eliminates the problem but we're reluctant to do that without understanding more.We are using Exposed
within a Ktor
backend written in Kotlin, in case that's relevant. Out Postgres version is 9.6.
This is working as expected:
For optimal performance when relying on Serializable transactions for concurrency control, these issues should be considered:
[...]
- When the system is forced to combine multiple page-level predicate locks into a single relation-level predicate lock because the predicate lock table is short of memory, an increase in the rate of serialization failures may occur. You can avoid this by increasing
max_pred_locks_per_transaction
,max_pred_locks_per_relation
, and/ormax_pred_locks_per_page
.
In your test case, there are three or more predicate lock on one table page, so the lock is escalated to a page lock. That is why the transactions conflict with each other (they affect the same page).
Try increasing max_pred_locks_per_page
.