Search code examples
postgresqldatabase-replicationtransaction-log

Postgres archiving is not functioning as intended, anyone have any suggestions?


I have streaming setup between two servers master/slave and that is working fine. The archive portion is half working. It just keeps creating archive and doesn't delete older ones. Can anyone suggest a solution?

I've tried the whole new base backup and restart the streaming etc. But I keep getting the following error in the logs:

2019-08-27 07:13:14 +08  DETAIL:  The failed archive command was: test ! -f /var/lib/pgsql/data/pg_xlog/000000010000028000000068 && cp pg_xlog/000000010000028000000068 /var/lib/pgsql/data/pg_xlog/000000010000028000000068
2019-08-27 07:13:15 +08  LOG:  archive command failed with exit code 1
2019-08-27 07:13:15 +08  DETAIL:  The failed archive command was: test ! -f /var/lib/pgsql/data/pg_xlog/000000010000028000000068 && cp pg_xlog/000000010000028000000068 /var/lib/pgsql/data/pg_xlog/000000010000028000000068
2019-08-27 07:13:15 +08  WARNING:  transaction log file "000000010000028000000068" could not be archived: too many failures

I've checked and the file is there:

-rw------- 1 postgres postgres 16777216 Aug 27 06:44 000000010000028000000068

Archive status:

-rw------- 1 postgres postgres 0 Aug 27 06:44 000000010000028000000068.ready

Postgres ver is 9.2.23. Unfortunately upgrading isn't an option.

This is the Archive portion of the config in the Master:

# - Archiving -

archive_mode = on       # allows archiving to be done
                                # (change requires restart)
#archive_command = '/bin/true'          # Used for trouble shooting archiving to temporarily start postgres.
archive_command = 'test ! -f /var/lib/pgsql/data/pg_xlog/%f && cp %p /var/lib/pgsql/data/pg_xlog/%f'            # command to use to archive a logfile segment
                                # placeholders: %p = path of file to archive
                                #               %f = file name only
                                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0            # force a logfile segment switch after this
                                # number of seconds; 0 disables

This a portion of the recovery.conf on the slave:

restore_command = 'cp -p /var/lib/pgsql/data/pg_xlog/%f %p'
trigger_file = '/var/lib/pgsql/i_am_master.pg.trigger'
recovery_target_timeline = 'latest'
archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/data/pg_xlog %r'

Is there anything else I need to check?


Solution

  • Since you don't get an error message from cp, the portion of the archive_command that failed was probably

    test ! -f /var/lib/pgsql/data/pg_xlog/%f
    

    That would mean that there is already a file of that name in the archive.

    Investigate how the file got there and delete it if that is safe, then WAL archiving will resume working.

    The test is there to avoid accidentally overwriting WAL files archived by somebody else.


    The name of your archive directory sounds like you are trying to archive directly into another cluster's pg_xlog directory. That won't do. You need a shared directory — one cluster archives to it and the other restores from it.