Search code examples
databasepostgresqlperformance

postgres behaviour with add foreign key and index concurrently on a large table in production?


I have a postgres database running in production with several tables containing billions of rows and I'd like to add a column to each of them with a foreign key to a user table. Assuming the data is populated, I was thinking to then add the foreign key constraint and index through CREATE INDEX CONCURRENTLY. However it's not clear to me how this performs in production while there are write transactions happening at the same time, specifically does the index creation process need to wait for all transactions to end to create the index all in one go, or does it do it gradually?

I'm concerned whether the size of the table will lead to a long table lock in production.


Solution

    1. Adding a column to an existing table, results in an ACCESS EXCLUSIVE lock. When you don't update the record with new content, to get started with a NULL, this can be pretty fast since there is no rewrite of the table required.

    2. Creating a foreign key also needs an ACCESS EXCLUSIVE lock. And this can also be very fast, just use NOT VALID to postpone the validation. Validation would be pointless anyway, everything is still NULL.

    3. Then you do the update of the table, all at once or large sets of records. This is going to take a while, and all records you update, will be locking using a ROW EXCLUSIVE lock. That means no other processes can update or delete these records. Reading current data and inserting new data is no problem.

    4. When all updates are done, you can create the index concurrently

    5. And finally validate the foreign key constraint. This takes a SHARE UPDATE EXCLUSIVE lock.

    Using this recipe the impact on production should be minimal. Beware of the extra disk space you might during the updates and the vacuum analyze that you might want to run afterwards.

    You should test this in a test environment using a comparable database load and size. This would also give you some estimate about how long it takes.