Search code examples
databasepostgresqlnextclouddatabase-tuning

what should be minimum ratio of dead tuple for a table to be considered for VACUUM FULL in Postgres


I am a developer and looking for an advise on optimisation or maintenance of Postgres database.

I am currently investigating on commands which helps in clean up/defragmentation of DB and release some memory to filesystem as DB disk storage space is usage is growing quickly. I found that "VACUUM FULL" can help release memory used by dead tuples. However could not find information on how many or percentage of dead tuples should be there before we consider running this command.

Currently we have two tables in Nextcloud Postgres database which has dead tuples. Also total relation size for these tables is higher than the disk usage reported by \dt+ command. I am providing the stats below. Please advise if they are eligible for "VACUUM FULL" based on given stats.

###########################################
 Disk space usage per table (\dt+ command)
###########################################

Schema |            Name             | Type  |  Owner   |    Size    | Description 
--------+-----------------------------+-------+----------+------------+-------------
public | oc_activity                 | table | XXXXXXXX | 4796 MB    | 
public | oc_filecache                | table | XXXXXXXX | 127 MB     | 

#################################
 oc_activity total relation size
#################################

SELECT pg_size_pretty( pg_total_relation_size('oc_activity') )

----------------
 pg_size_pretty 
----------------
 9666 MB
########################################
 Additional stats for oc_activity table
########################################

 relid | schemaname |   relname   | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count 
-------+------------+-------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
 yyyyy | public     | oc_activity |      272 |   1046966870 |     4737 |      57914604 |   1548217 |         0 |    325585 |             0 |   11440511 |     940192 |              268430 |             |                 |              | 2023-02-15 10:01:36.657028+00 |            0 |                0 |             0 |                 3
###################################
 oc_filecache total relation size
###################################

SELECT pg_size_pretty( pg_total_relation_size('oc_filecache') )

----------------
 pg_size_pretty 
----------------
 541 MB
#########################################
 Additional stats for oc_filecache table
#########################################

SELECT * FROM pg_stat_all_tables WHERE relname='oc_filecache'

 relid | schemaname |   relname    | seq_scan | seq_tup_read |  idx_scan  | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum |        last_autovacuum        | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count 
-------+------------+--------------+----------+--------------+------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
 zzzzz | public     | oc_filecache |   104541 |  28525391484 | 1974398333 |    2003365293 |     43575 |    695612 |     39541 |        348823 |     461510 |      19418 |                4069 |             | 2023-02-16 10:46:15.165442+00 |              | 2023-02-16 16:25:32.568168+00 |            0 |                8 |             0 |                33

Solution

  • There is no hard rule. I personally would consider a table uncomfortably bloated if the pgstattuple extension showed that less than a third or a quarter of the table are user data and the rest is dead tuples and empty space.

    Rather than regularly running VACUUM (FULL) (which is downtime), you should strive to fix the problem that causes the table bloat in the first place.