Search code examples
postgresqlamazon-web-servicesrds

postgres dump compression options to AWS RDS


I'm currently pushing local data up to a AWS RDS db - but because of the data size, this is taking a long time. Was hoping someone might be able to suggest a way to get the dump file compressed before sending up (and then getting it read at the other end)

Currently the dump file is being written thus:

pg_dump -t properties_staging --data-only project_development > data/properties_staging.dump

And pushed up to AWS this way:

psql -a -h blahblahblah.blahblahblah.eu-west-1.rds.amazonaws.com -U master -d ebdb -p 5432 < data/properties_staging.dump

What are the best options available for slimming this dump file down while still punting it up to RDS?

All suggestions very gratefully received.

Thanks


Solution

  • pg_dump supports several options that may be faster and/or result in smaller files. From the documentation, some options to try. Note this is very dependent on your data, and some choices will require you to use pg_restore instead of psql.

    pg_dump options

    • --format=custom use a binary format, compressed by default.
    • --format=directory create multiple files in dump. compressed by default. Allows parallel dumps.
    • --jobs=N use N threads for dump and compression. May speed up operation.
    • --compress=N number from 0 to 9. Zero is no compression, one is low-cpu compression, 9 is high-cpu compression. Here's a quick summary of what this means.

    Again, what will be the fastest? I don't know. Test and benchmark for your specific situation. It'll likely be faster if you copy the output of pg_dump to an EC2 instance (with a fairly fast CPU, not a micro or t2) in the same AZ and run pg_restore or psql from there.

    Note pg_restore has a --jobs=N flag too. This setting doesn't need to match the one in pg_dump.