Search code examples
postgresqlamazon-s3backupwal

Is it possible to restore db from WALs twice?


I have a main database server that WALs is periodically archived on s3. So s3 has a 'snapshot' of a database with all the corresponding latest WALs. I have another (local) database server that I want to periodically update to be actual to the state of the main database server. So I once copied "main" directory from s3 and applied all the WALs from s3 by using restore.conf The only thing I've changed in this file is:

restore_command = 'aws s3 cp s3://%bucketName%/database/pg_wal/%f %p'

It was successful. After some time I want to apply all the latest WALs from s3 to being "more synchronized" with a main database server. Is it possible to do it somehow? I know exactly, that I did not make any updates or writes into my "copied" database server. When I'm trying to do it in the exactly same way as before I am getting the next errors (from stderr):

fatal error: An error occurred (404) when calling the HeadObject 
operation: Key "database/pg_wal/00000001000001EF0000001F" does not 
exist
fatal error: An error occurred (404) when calling the HeadObject 
operation: Key "database/pg_wal/00000002.history" does not exist
fatal error: An error occurred (404) when calling the HeadObject 
operation: Key "database/pg_wal/00000001.history" does not exist
fatal error: An error occurred (403) when calling the HeadObject 
operation: Forbidden
fatal error: An error occurred (403) when calling the HeadObject 
operation: Forbidden
fatal error: An error occurred (403) when calling the HeadObject 
operation: Forbidden
fatal error: An error occurred (403) when calling the HeadObject 
operation: Forbidden
fatal error: An error occurred (403) when calling the HeadObject 
operation: Forbidden

This is a more detailed description of my procedure:

I have a two directories on s3: basebackup and pg_wal. basebackup contains base, global, pg_logical, pg_multixact, pg_xact, PG_VERSION, backup_label files.

When I recover it the first time, I do the following:

  1. Stop postgres

  2. aws s3 sync s3://%bucketname%/basebackup ~/10/main

  3. mkdir empty directories in ~/10/main

  4. copied recovery.conf.sample into ~/10/main/recovery.conf

  5. edit recovery.conf as above

  6. start PostgreSQL

When I'm doing it again after some time I'm doing steps 1, 4, 5, 6 and getting the described result.

Probably, I need to somehow specify the first WAL from s3 bucket to being restored? Because we already restored some of them before. Or it is impossible at all?


Solution

  • There seems to be a lot wrong with your procedures:

    • A complete backup does not only consist of the files and directories you list above, but of the complete data directory (pg_wal/pg_xlog can be empty).

    • After the first recovery, PostgreSQL will choose a new time line, rename backup_label and recovery.conf and come up as a regular database.

      You cannot resume recovering such a database. I don't know what exactly you did to get into recovery mode again, but you must have broken something.

    Once a database has finished recovery, the only way to recover further is to restore the initial backup again and recover from the beginning.

    Have you considered using point-in-time recovery with recovery_target_action = 'pause'? Then PostgreSQL will stay in recovery mode, and you can run queries against the database. To continue recovering, define a new recovery target and restart the server.