Search code examples
postgresqlarchive

Postgresql WAL archive_command file compare


I have read in "PostgreSQL Replication by Hans-Jurgen Schonig", and a few places that it is best practice NOT to overwrite an archive WAL file during the execution of the archive_command - can anyone please expand on the reasons for this? and if the following scenario is valid for WAL overwriting?

I have written a script that will execute the following high level logic for an individual WAL archiving procedure:

if (/archive/00000001000000F700000067 exists and is readable) and (00000001000000F700000067 is byte by byte equal to /archive/00000001000000F700000067)
  exit with status 0
else
  if (copy 00000001000000F700000067 to /archive/00000001000000F700000067 is successful)
    if (/archive/00000001000000F700000067 exists and is readable) and (00000001000000F700000067 is byte by byte equal to /archive/00000001000000F700000067)
      exit with status 0
    else
      exit with status non-zero
  else
    exit with status non-zero

In short, this approach hopes to defend against at least the scenario where the original WAL file is archived incorrectly - the copy has valid filename but is corrupt (due to hardware failure for example). My understanding of the WAL archiving process in this instance:

  • The archive_command will return non-zero exit status during the byte-for-bye comparison after the copy procedure (let us assume the copy procedure has a false success response)
  • According to the documentation, upon non-zero exit_status WAL archiving will be re-attempted indefinitely - a second attempt of WAL archiving should occur
  • The archive_command will recognize that the existing archive WAL does not match the current WAL byte-wise
  • The copy procedure will occur a second time, hopefully overwriting the corrupt file
  • exit_status 0 in the event of a success comparison of the files, otherwise this process will be repeated

There is a very minor overhead involved with the comparison (which I may update to md5 check), can anyone see any problems that can arise from such an approach? or recommend anything further?

Thank you


Solution

  • If I read your pseudo-code correctly, you'll overwrite WAL segments in the archive if they are not identical to your current WAL-to-be-archived.

    The problem with that is that WAL archives are very important information, and you are scribbling over an archived WAL segment that somebody wrote before. If there is a misconfiguration where two PostgreSQL clusters write to the same archive directory by mistake, the WAL archive will be clobbered.

    It would be better to stop archiving in that case and alert an administrator.