Search code examples
phplinuxpostgresqlsortingpg-dump

Sorting postgresql database dump (pg_dump)


I am creating to pg_dumps, DUMP1 and DUMP2.

DUMP1 and DUMP2 are exactly the same, except DUMP2 was dumped in REVERSE order of DUMP1.

Is there anyway that I can sort the two DUMPS so that the two DUMP files are exactly the same (when using a diff)?

I am using PHP and linux. I tried using "sort" in linux, but that does not work...

Thanks!


Solution

  • From your previous question, I assume that what you are really trying to do is compare to databases to see if they are they same including the data.

    As we saw there, pg_dump is not going to behave deterministically. The fact that one file is the reverse of the other is probably just coincidental.

    Here is a way that you can do the total comparison including schema and data.

    First, compare the schema using this method.

    Second, compare the data by dumping it all to a file in an order that will be consistent. Order is guaranteed by first sorting the tables by name and then by sorting the data within each table by primary key column(s).

    The query below generates the COPY statements.

    select
        'copy (select * from '||r.relname||' order by '||
        array_to_string(array_agg(a.attname), ',')||
        ') to STDOUT;'
    from
        pg_class r,
        pg_constraint c,
        pg_attribute a
    where
        r.oid = c.conrelid
        and r.oid = a.attrelid
        and a.attnum = ANY(conkey)
        and contype = 'p'
        and relkind = 'r'
    group by
        r.relname
    order by
        r.relname
    

    Running that query will give you a list of statements like copy (select * from test order by a,b) to STDOUT; Put those all in a text file and run them through psql for each database and then compare the output files. You may need to tweak with the output settings to COPY.