Search code examples
postgresqlpostgresql-9.6wal

PostgreSQL 9.6 understanding wal files


I am trying to understand the behaviour of wal files. The wal related settings of the database are as follows:

"min_wal_size"  "2GB"   
"max_wal_size"  "20GB"
"wal_segment_size"  "16MB"
"wal_keep_segments" "0"
"checkpoint_completion_target"  "0.8"
"checkpoint_timeout"    "15min"

The number of wal files is always 1281 or higher:

SELECT COUNT(*) FROM pg_ls_dir('pg_xlog') WHERE pg_ls_dir ~ '^[0-9A-F]{24}';
-- count 1281

As I understand it this means wal files currently never fall below max_wal_size (1281 * 16 MB = 20496 MB = max_wal_size) ??

I would expect the number of wal files to decrease below maximum right after a checkpoint is reached and data is synced to disk. But this is clearly not the case. What am I missing?


Solution

  • As per the documentation (emphasis added):

    The number of WAL segment files in pg_xlog directory depends on min_wal_size, max_wal_size and the amount of WAL generated in previous checkpoint cycles. When old log segment files are no longer needed, they are removed or recycled (that is, renamed to become future segments in the numbered sequence). If, due to a short-term peak of log output rate, max_wal_size is exceeded, the unneeded segment files will be removed until the system gets back under this limit. Below that limit, the system recycles enough WAL files to cover the estimated need until the next checkpoint, and removes the rest

    So, as per your observation, you are probably observing the "recycle" effect -- the old WAL files are getting renamed instead of getting removed. This saves the disk some I/O, especially on busy systems.

    Bear in mind that once a particular file has been recycled, it will not be reconsidered for removal/recycle again until it has been used (i.e., the relevant LSN is reached and checkpointed). That may take a long time if your system suddenly becomes less active.