I'm using PostgreSQL v9.1 for my organization. The database is hosted in Amazon Web Services (EC2 instance) below a Django web-framework which performs tasks on the database (read/write data). The problem is, to backup this database in a periodic fashion in a specified format (see Requirements).
Requirements:
But I have the following constraints too.
postgresql.conf
and authenticate users in pg_hba.conf
which implies I need a server-restart to turn it on which again, stops the incoming data for some minutes. This is not permitted (see above constraint).Is there a clever way to backup the master-db for my needs? Is there a tool which can automate this job for me?
This is a very crucial requirement as data has begun to appear into the master-db since few days and I need to make sure there's replication of master-db on some standby-server all the time.
If, and only if, your entire database including pg_xlog
, data
, pg_clog
, etc is on a single EBS volume, you can use EBS snapshots to do what you describe because they are (or claim to be) atomic. You can't do this if you stripe across multiple EBS volumes.
The general idea is:
Take an EBS snapshot using the EBS APIs using command line AWS tools or a scripting interface like the wonderful boto Python library.
Once the snapshot completes, use AWS API commands to create a volume from it and attach the volume your instance, or preferably to a separate instance, and then mount it.
On the EBS snapshot you will find a read-only copy of your database from the point in time you took the snapshot, as if your server crashed at that moment. PostgreSQL is crashsafe, so that's fine (unless you did something really stupid like set fsync=off
in postgresql.conf
). Copy the entire database structure to your final backup, e.g archive it to S3 or whatever.
Unmount, unlink, and destroy the volume containing the snapshot.
This is a terribly inefficient way to do what you want, but it will work.
It is vitally important that you regularly test your backups by restoring them to a temporary server and making sure they're accessible and contain the expected information. Automate this, then check manually anyway.
If your volume is mapped via LVM, you can do the same thing at the LVM level in your Linux system. This works for the lvm-on-md-on-striped-ebs configuration. You use lvm snapshots instead of EBS, and can only do it on the main machine, but it's otherwise the same.
You can only do this if your entire DB is on one file system.
You're going to have to restart the database. You do not need to restart it to change pg_hba.conf
, a simple reload (pg_ctl reload
, or SIGHUP
the postmaster) is sufficient, but you do indeed have to restart to change the archive mode.
This is one of the many reasons why backups are not an optional extra, they're part of the setup you should be doing before you go live.
If you don't change the archive mode, you can't use PITR, pg_basebackup, WAL archiving, pgbarman, etc. You can use database dumps, and only database dumps.
So you've got to find a time to restart. Sorry. If your client applications aren't entirely stupid (i.e. they can handle waiting on a blocked tcp/ip connection), here's how I'd try to do it after doing lots of testing on a replica of my production setup:
postgresql.conf
to set the desired archive mode. Make any other desired restart-only changes at the same time, see the configuration documentation for restart-only parameters.SIGCONT
pgbouncer, wait for it to finish, and repeat.psql
I'd rather explicitly set pgbouncer to a "hold all connections" mode, but I'm not sure it has one, and don't have time to look into it right now. I'm not at all certain that SIGSTOP
ing pgbouncer
will achieve the desired effect, either; you must experiment on a replica of your production setup to ensure that this is the case.
Use WAL archiving and PITR, plus periodic pg_dump
backups for extra assurance.
See:
... and of course, the backup chapter of the user manual, which explains your options in detail. Pay particular attention to the "SQL Dump" and "Continuous Archiving and Point-in-Time Recovery (PITR)" chapters.
PgBarman automates PITR option for you, including scheduling, and supports hooks for storing WAL and base backups in S3 instead of local storage. Alternately, WAL-E is a bit less automated, but is pre-integrated into S3. You can implement your retention policies with S3, or via barman.
(Remember that you can use retention policies in S3 to shove old backups into Glacier, too).
Outages happen.
Outages of single-machine setups on something as unreliable as Amazon EC2 happen a lot.
You must get failover and replication in place. This means that you must restart the server. If you do not do this, you will eventually have a major outage, and it will happen at the worst possible time. Get your HA setup sorted out now, not later, it's only going to get harder.
You should also ensure that your client applications can buffer writes without losing them. Relying on a remote database on an Internet host to be available all the time is stupid, and again, it will bite you unless you fix it.