Search code examples
postgresqledgedb

serialization error when bulk insert in "SERIALIZABLE" transaction


What I did

  • I created two tables participants and participant_dates
  • in one transaction, I insert one participant "A" and 30 participant_dates of A

below is my python code (full)

import asyncio
from datetime import date
from uuid import uuid4

import asyncpg

async def get_connection():
    return await asyncpg.connect(
        user='postgres',
        password='1234',
        database='postgres',
        host='127.0.0.1',
        port='5432'
    )


async def bulk() -> None:
    """
    CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

    CREATE TABLE IF NOT EXISTS participants (
        id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
        name VARCHAR(255) NOT NULL
    );

    CREATE TABLE IF NOT EXISTS participant_dates (
        id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
        participant_id UUID REFERENCES participants(id),
        date DATE
    );
    """
    conn = await get_connection()
    try:
        async with conn.transaction(isolation="serializable"):
            id_ = await conn.fetchval(
                f""" 
            INSERT INTO participants (name) VALUES ('test_{str(uuid4())}') RETURNING id;
            """
            )
            await conn.executemany(
                """
                INSERT INTO participant_dates (participant_id, date) VALUES (
                    $1, 
                    $2
                ); 
                """,
                [
                    (id_, date(2025, 1, 1)),
                    (id_, date(2025, 1, 2)),
                    (id_, date(2025, 1, 3)),
                    (id_, date(2025, 1, 4)),
                    (id_, date(2025, 1, 5)),
                    (id_, date(2025, 1, 6)),
                    (id_, date(2025, 1, 7)),
                    (id_, date(2025, 1, 8)),
                    (id_, date(2025, 1, 9)),
                    (id_, date(2025, 1, 10)),
                    (id_, date(2025, 1, 11)),
                    (id_, date(2025, 1, 12)),
                    (id_, date(2025, 1, 13)),
                    (id_, date(2025, 1, 14)),
                    (id_, date(2025, 1, 15)),
                    (id_, date(2025, 1, 16)),
                    (id_, date(2025, 1, 17)),
                    (id_, date(2025, 1, 18)),
                    (id_, date(2025, 1, 19)),
                    (id_, date(2025, 1, 20)),
                ]
            )

    except Exception as e:
        print(e)


async def main() -> None:
    # 50 concurrent call
    await asyncio.gather(
        bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(),
        bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(),
        bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(),
        bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(),
        bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(), bulk(),
    )

asyncio.run(main())

What happened

bunch of below errors appeared.

could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on conflict out to pivot 882, during read.
HINT:  The transaction might succeed if retried.

What I want to do.

  • I can not lower transaction level (I'm using edgedb and it forces SERIALIZABLE)
  • I want to avoid error.
  • and I want to maintain full concurrency (50 or more concurrent call)

Question

  • could you please explain why there are conflicts? I use uuid (it's not auto incremented integer so it doesn't require lock I guess) and they are all transactions inserting different rows. where do (lock) conflicts come from?

P.S.

my problem is simillar to Why does PostgreSQL serializable transaction think this as conflict? but even if I added

            await conn.execute(
                "SET enable_seqscan = off;"
            )
            id_ = await conn.fetchval(
                f"""
            INSERT INTO participants (name) VALUES ('test_{str(uuid4())}') RETURNING id;
            """
...
            )

error stays the same

could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on commit attempt with conflict in from prepared pivot.
HINT:  The transaction might succeed if retried.

Solution

  • An INSERT into participant_dates will look up and lock the referenced row in participants. With SERIALIZABLE isolation, this read will put a predicate lock on participants. This predicate lock is probably an SIRead lock on the index leaf page of participants' primary key index that contains the id in question.

    As soon as a concurrent transaction inserts rows for a different id that happens to be in the same index page, you will get a (false positive) serialization error. These false positive serialization errors are to be expected:

    While PostgreSQL's Serializable transaction isolation level only allows concurrent transactions to commit if it can prove there is a serial order of execution that would produce the same effect, it doesn't always prevent errors from being raised that would not occur in true serial execution.

    With SERIALIZABLE isolation, you have to be ready to repeat transactions if they fail with a serialization error. It makes sense to reduce these errors, but the attempt to avoid them completely is a fool's errand. As a consolation, you will find that the errors will become less frequent as the tables grow.