Search code examples
postgresqlconcurrencybulk-loadpostgresql-copy

Can reads occur whilst executing a bulk \copy batch of inserts


I plan to be batch inserting a large volume of rows into a Postgres table using the \copy command once per minute. My benchmarks show I should be able to insert about 40k rows per second, and I plan to do this for 3 or 4 seconds each minute.

Are read queries on the table blocked or impacted whilst the \copy dump is occurring? And I wonder the same for inserts as well?

I'm assuming as well that tables which aren't being \copy'd into will face no blocking issues.


Solution

  • The manual:

    The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.

    That's the beauty of the MVCC model used by Postgres.

    So, no, readers are not blocked. Neither in the target table, nor in any other table.

    Impacted? Well, bulk loading large amounts of data incurs considerable load on the system (especially I/O) which potentially impacts all other processes competing for the same resources. So if your system is already reaching some limits, readers may be impacted this way.

    Rows written by your COPY command (by way of psql's \copy) are not visible to other transactions until the transaction is committed.

    Concurrent INSERT commands are not blocked either - unless you have UNIQUE (or PK) constraints / indexes where writes do compete. Avoid race conditions with overlapping unique values! And performance can be impacted even with non-unique indexes as writing to indexes involves some short-term locking.

    Generally, keep indexes on your table to a minimum if you plan huge bulk writes every minute. Every index incurs additional costs for the write - and may bloat more than the table if write patterns are unfavorable. Autovacuum may have a hard time to keep up.