Search code examples
databasepostgresqldatabase-partitioningpostgresql-12database-locking

Postgres long-running transaction holding lock on parent partitioned table


TL;DR: we have long-running imports which seem to hold locks on the parent partitioned table even though nothing is directly referencing the parent table.

Background

In our system, we have inventories and inventory_items. Inventories tend to have 200k or so items, and it made sense for our access patterns to partition the inventory_items table by inventory_id using native partitioning (we're on Postgres 12). In other words, each inventory gets its own partitioned table of inventory_items. This is accomplished with the following DDL:

CREATE TABLE public.inventory_items (
  inventory_id integer NOT NULL,
  /* ... */
)
PARTITION BY LIST (inventory_id);

In our app code, when an inventory is created via the web dashboard, we automatically create the partitioned child inventory_items tables table via:

CREATE TABLE IF NOT EXISTS inventory_items_#{inventory_id}
  PARTITION OF inventory_items
  FOR VALUES IN (#{inventory_id});

Long import jobs block creating new inventories

It's typical for these inventories to be fully reloaded / reimported once per day, via CSV or otherwise, and these import tasks can sometimes take a while.

We noticed that while these long imports are running, it's not possible to create a new inventory, because, as mentioned above, creating an inventory means creating the partitioned child inventory_items table, and there is some lock contention between the long-running import and creating the inventory in the web dashboard, which is bad: we can't block users from creating inventories just because there's a totally unrelated import happening.

Sequence of events / locks when trying to create Inventory while import running

I'm using the following query in psql to determine who holds what locks:

select pid, relname, mode
from pg_locks l
join pg_class t on l.relation = t.oid
where t.relkind = 'r';

This query returns successfully obtained/held locks; it will not display pids that are waiting to obtain a lock (because some other pid holds it). For those, you have to look at the postgres logs.

Starting the slow import

Once the import starts, the worker process (pid 9029) grabs the following locks

 pid  |        relname     |       mode
------+--------------------+------------------
 9029 | inventory_items_16 | AccessShareLock
 9029 | inventory_items_16 | RowExclusiveLock

The inventory that we're importing into has an id of 16, so the locks being held are on the inventory_items partitioned child tables that belong to that inventory. Note that there doesn't appear to be any locks on the parent inventory_items table.

Attempt to create inventory in the web dashboard

When I try and create an inventory in the dashboard, the requests stalls and times out due to 30s SQL statement timeout. Before it times out, the locks look like this:

 pid  |        relname     |       mode
------+--------------------+------------------
 7089 | inventories        | RowExclusiveLock

 9029 | inventory_items_16 | AccessShareLock
 9029 | inventory_items_16 | RowExclusiveLock

PID 7089 is the web server. It successfully grabs RowExclusiveLock on inventories (the INSERT INTO inventories), but looking at the postgres logs, it's attempting and failing to grab an AccessExclusiveLock on 119795, which is the parent inventory_items table:

postgres.7089 [RED] [29-1]  sql_error_code = 00000 LOG:  statement: CREATE TABLE IF NOT EXISTS inventory_items_16
postgres.7089 [RED] [29-2]    PARTITION OF inventory_items
postgres.7089 [RED] [29-3]    FOR VALUES IN (16);
postgres.7089 [RED] [29-4]
postgres.7089 [RED] [30-1]  sql_error_code = 00000 LOG:  process 7089 still waiting for AccessExclusiveLock on relation 119795 of database 16402 after 1000.176 ms
postgres.7089 [RED] [30-2]  sql_error_code = 00000 DETAIL:  Process holding the lock: 9029. Wait queue: 7089.
postgres.7089 [RED] [30-3]  sql_error_code = 00000 STATEMENT:  CREATE TABLE IF NOT EXISTS inventory_items_16
postgres.7089 [RED] [30-4]    PARTITION OF inventory_items
postgres.7089 [RED] [30-5]    FOR VALUES IN (16);

I figure that the reason an AccessExclusiveLock is needed on the parent table when creating a child partition is because postgres needs to write some internal schema-y metadata to the parent table so it can route rows with inventory_id=16 to this new table, which makes sense to me.

But, judging by my pg_locks query, I don't understand where the lock contention is coming from. The web server needs an AccessExclusiveLock on the parent table, but pg_locks shows that the only locks held are on the child inventory_items_16 table.

So, what could be happening here? Do locks on child tables "expand" in locks on the parent table, or otherwise contend with locks on the parent table?

And is there some other way we could approach this problem? We feel pretty confident in our decision to partition these tables, but this unexpected lock contention is causing real problems, so we're looking for a clean, minimal-maintenance way to keep this basic architecture.

Last little tidbit

In rare cases, the presence of an active import does NOT block the web worker. 90% of the time it does, but sometimes it doesn't. So, somewhere in this mix is a tiny bit of nondeterminism which confounds everything.


Solution

  • Creating a partition with CREATE TABLE ... PARTITION OF ... requires an ACCESS EXCLUSIVE lock on the partitioned table, which will conflict with all access to the partitioned table.

    On the other hand, inserting into the partition requires an ACCESS SHARE lock on the partitioned table while the insert statement is being planned. That causes a lock conflict.

    I see two ways out:

    1. Create new partitions in two steps:

      CREATE TABLE inventory_items_42 (
         LIKE inventory_items INCLUDING DEFAULTS INCLUDING CONSTRAINTS
      );
      ALTER TABLE inventory_items
         ATTACH PARTITION inventory_items_42 FOR VALUES IN (42);
      

      That requires only a SHARE UPDATE EXCLUSIVE lock on the partitioned table (from PostgreSQL v12 on), which is compatible with concurrent inserts.

    2. Use a server prepared statement for the INSERT into the partition and make sure you prepare the statement before you start long running transaction that loads the data. You can use PostgreSQL's PREPARE and EXECUTE statements for that or use your API's facilities.