Search code examples
databasepostgresqldumppg-dumpall

Removing duplicate schema/tables from a postgres database that had a dump all restored to it


Today I accidentally restored a pg_dumpall file to the database that that dumpall was generated from. This restore has caused a ton of duplicates of certain ID's and such.

This is a complex production database and it hasn't corrupted the entire database but its really affecting the ability to change things on the site that uses it. Now I know I could just cut my losses and reset the database to the original pg_dumpall (it was as recent as last night) but that would mean that a full day's work/posting on the site that uses the database would be lost.

Is there anything I can do to remedy this situation?


Solution

  • Without primary keys, this is hard. With primary keys, this couldn't have happened. The best thing you can do is, IMHO:

    1. rename all your schemas (eg schema -> bad_schema)
    2. re-import last night's dump (without dropping the DB, of course)
    3. add PK constraints to all the new tables (this requires manual work!)
    4. for all schemas.tables: add rows from bad_schema.tableXXX to schema.tableXXX if they don't yet exist (this may require (non-unique) indexes on the bad_schema.XXX, corresponding to the intended FK's)
    5. (optionally) adjust the sequences to the existing max(value) in their corresponding column.

    If you don't understand the above (or if it is too much work) just take your losses, and install yesterday's backup. You could first rename the damaged database (or rename the schemas), and try to extract new records from it later.


    Summary: do realize that every (partly) solution requires Primary Keys in some way. A table without a PK is fragile and meaningless.