Search code examples
postgresqlreplicationpg-dumppg-restorebucardo

Bootstrap bucardo replication after pg_restore


Currently I am setting up Master/Master Replication with bucardo between 5 Nodes on different locations (should provide location transparency). The database holds ~500 Tables which should be replicated. I grouped them into smaller replication herds of 50 Tables at maximum based on their dependency on each other. All tables have primary keys defined and the sequencers on each node are set up to provide system wide unique identities (based on residue class)

To get an initial database on each node, I made a --data-only custom format pg_dump into a File and restored this on each node via pg_restore. Bucardo sync is setup with the bucardo_latest strategy to resolve conflicts. Now when I start syncing bucardo is deleting all datasets in the origin database first and inserting it again from one of the restored nodes, because all restored datasets have a "later timestamp" (the point in time when I called pg_restore). This ultimately prohibits the inital startup as bucardo needs very much time and also fails, as there are lots of datasets to solve and timeouts often too short.

I also have 'last_modified' timestamps on each table which are managed by UPDATE triggers, but as I understand it, pg_dump inserts data via COPY, and therefore these triggers don't get fired.

  • Which timestamp does bucardo use to find out who is bucardo_latest?
  • Do I have to call pg_dump with something like set SESSION_REPLICATION_ROLE = 'replica';?

I just want bucardo to keep track of every new change, not executing pseudo changes because of the restore.

EDIT: pg_restore has definitely fired several triggers at restore time...as said I keep track on user and last modification date in each table, and those values are set to the user and timestamp when the restore was done. I am aware, that I can set SESSION_REPLICATION_ROLE for a plain text format restore via psql. Is this also possible for pg_restore somehow?


Solution

  • The common approach is make the dump/restore process before configure the replication.

    So an option will be:

    1. drop the bucardo schema in each database
    2. do a bucardo remove for each object (most of them allow use all, like bucardo remove table all
    3. dump/restore your data
    4. Configure again the replication. Just make sure that when adding the sync, set the option onetimecopy=0. It's the default but I feel safer making it explicit.

    Which timestamp does bucardo use to find out who is bucardo_latest?

    bucardo handles its own timestamp value. Each table should have a trigger named like bucardo.delta_myschema_mytable that makes and insert in a table named like bucardo.delta_myschema_mytable. This table has a column txntime timestamp with time zone not null default now() and this is the timestamp used.

    Do I have to call pg_dump with something like set SESSION_REPLICATION_ROLE = 'replica';?

    AFAIK, if bucardo triggers are already set in the tables, the option --disable-triggers of pg_restore should do the trick.

    You can also check these articles about working with large databases and the use of session_replication_role