Search code examples
postgresqlmaterialized-viewspostgresql-9.3

How can I prevent materialized views from refreshing during pg_restore?


I have created a dump of the database using pg_dump in "custom" format (-Fc). This format allows for pg_restore to be invoked with the "jobs" option (-j8). The jobs options starts 8 processes, and restores the vast majority of relations in my database within 10 minutes.

I'm left with 4 processes. One of them is the refresh of a materialized view, and the other 3 are indexes to be applied to 3 tables that the materialized view uses as data sources. The indexes are "waiting" according to pg_stat_activity, presumably because the REFRESH of the materialized view is still accessing the source tables.

When the indexes are in place, the refresh of the view only takes a couple of minutes. Because the indexes are not in place during the REFRESH, I cut the REFRESH process off at 17 hours, which made pg_restore fail.

How can I

  1. Force the order of items so the indexes get created first
  2. Turn off the refresh of the materialized view and do it manually later
  3. Manipulate the dump file in custom format to say "WITH NO DATA"
  4. Intercept the REFRESH MATERIALIZED VIEW statement and throw it in the trash

Or any other solution that gets the job done?


Solution

  • David G Johnston posted an answer for me on the pgsql-hackers mailing list.

    "Have/can you try the '-l (el) & -L' options to pg_restore?

    http://www.postgresql.org/docs/9.3/static/app-pgrestore.html

    (example of usage is toward the bottom of the page)

    Basically re-order the command sequence so that the materialize runs as late as possible, or just disable it altogether.

    pg_dump/pg_restore should be taught to handle this better, which is the main reason why Craig had you post here ASAP, but to get it functional for now manual intervention will be necessary. In theory the 'listing' capabilities should allow you to do what you need."

    I think this (pg_restore -l | pg_restore -L) will get me where I need to go for now by inserting a small shell script in between that pushes the materialized views to the end of the list, but then I will also have to manage my own dependencies for the items that I re-sort (MatViews of MatViews). This pretty seriously limits the usefulness of materialized views for me. For version 9.3.x, I'm likely to require MatView dependencies no more than 1 deep.

    Edit: To stop materializing the data on restore, I started doing this:

    pg_dump mydatabase -Fd backup_dir
    pg_restore -l  -Fd backup_dir | sed '/MATERIALIZED VIEW DATA/d' > ordered.lst
    pg_restore -L ordered.lst -Fd backup_dir mydatabase
    

    This removes the REFRESH MATERIALIZED VIEW statements from the restore. Thanks to David G Johnston for the tips.