Search code examples
postgresqlautovacuum

Understanding auto-vacuum and when it is triggered


We've noticed one of our tables growing considerably on PG 12. This table is the target of very frequent updates, with a mix of column types, including a very large text column (often with over 50kb of data) - we run a local cron job that looks for rows older than X time and set the text column to a null value (as we no longer need the data for that particular column after X amount of time).

We understand this does not actually free up disk space due to the MVCC model, but we were hoping that auto-vacuum would take care of this. To our surprise, the table continues to grow (now over 40gb worth) without auto-vacuum running. Running a vacuum manually has addressed the issue and we no longer see growth.

This has lead me to investigate other tables, I'm realising that I don't understand how auto-vacuum is triggered at all.

Here is my understanding of how it works, which hopefully someone can pick apart:

  • I look for tables that have a large amount of dead tuples in them: select * from pg_stat_all_tables ORDER BY n_dead_tup desc;
  • I identify tableX with 33169557 dead tuples (n_dead_tup column).
  • I run a select * from pg_class ORDER BY reltuples desc; to check how many estimated rows there are on table tableX
  • I identify 1725253 rows via the reltuples column.
  • I confirm my autovacuum settings: autovacuum_vacuum_threshold = 50 and autovacuum_vacuum_scale_factor = 0.2
  • I apply the formula threshold + pg_class.reltuples * scale_factor, so, 50 + 1725253 * 0.2 which returns 345100.6

It is my understanding that auto-vacuum will start on this table once ~345100 dead tuples are found. But tableX is already at a whopping 33169557 dead tuples!, The last_autovacuum on this table was back in February.

Any clarification would be welcome.


Solution

  • Your algorithm is absolutely correct.

    Here are some reasons why things could go wrong:

    • autovacuum runs, but is so slow that it never gets done

      If you see no running autovacuum, that is not your problem.

    • autovacuum runs, but a long running open transaction prevents it from removing dead tuples

    • other tables need to be vacuumed more urgently (to avoid transaction ID wraparound), so the three workers are busy with other things

    • autovacuum runs, but conflicts with high concurrent locks on the table (LOCK TABLE, ALTER TABLE, ...)

      This makes autovacuum give up and try again later.

    • autovacuum is disabled, perhaps only for that table