Search code examples
postgresqlpg-dumpcloningpg-restore

Managing foreign keys when using pg_restore with multiple dumps


I have a bit of a weird issue. We were trying to create a database baseline for our local environment that has very specific data pre-seeded into it. Our hopes were to make sure that everyone was operating with the same data, making collaboration and reviewing code a bit simpler.

My idea for this was to run a command to dump the database whenever we run a migration or decide a new account is necessary for local dev. The issue with this is the database dump is around 17MB. I'm trying to avoid us having to add a 17MB file to GitHub every time we update the database.

So the best solution I could think of was to setup a script to dump each individual table in the database. This way, if a single table is updated, we'd only be pushing that backup to GitHub and it would be more along a ~200kb file as opposed to 17mb.

The main issue I'm running into with this is trying to restore the database. With a full dump, handling the foreign keys is relatively simple as it's all done in a single restore command. But with multiple restores, it gets a bit more complicated.

I'm looking to find a way to restore all tables to a database, ignoring triggers and constraints, and then enabling them again once the data has been populated. (or find a way to export the tables based on the order the foreign keys are defined). There are a lot of tables to work with, so doing this manually would be a bit of a task.

I'm also concerned about the relational integrity of the database if I disabled/re-enable constraints. Any help or advice would be appreciated.

Right now I'm running the following on every single table:

pg_dump postgres://user:password@pg:5432/database -t table_name -Fc -Z9 -f /data/www/database/data/table_name.bak

And then this command to restore all backups to the DB.

$data_command = "pg_restore --disable-triggers -d $dbUrl -Fc \"%s\"";
$backups = glob("$directory*.bak");
    foreach($backups as $data_file){
        if($data_file != 'data_roles.bak') {
            exec(sprintf($data_command, $data_file));
        }
    }

This obviously doesn't work as I hit a ton of "Relationship doesn't exist" errors. I guess I'm just looking for a better way to accomplish this.


Solution

  • I would separate the table data and the database metadata.

    Create a pre- and post-data scfipt with

    pg_dump --section=pre-data -f pre.sql mydb
    pg_dump --section=post-data -f post.sql mydb
    

    Then dump just the data for each table:

    pg_dump --section=data --table=tab1 -f tab1.sql mydb
    

    To restore the database, first restore pre.sql, then all the table data, then post.sql.

    The pre- and post-data will change often, but they are not large, so that shouldn't be a problem.