Search code examples
regexpostgresqlpsqlpg-dump

How can I match all tables except one using --exclude-table-data flag in pg_dump?


I'm using psql 13.4 to make a pg_dump of a database. I need only the schema (no data) for all tables except one enumerated reference table, for which I need the data. I see that pg_dump has an --exclude-table-data flag that supports regex patterns. But it does not have an --include-table-data flag.

So to exclude table data for all tables except the single reference table, I thought I could use a regex with negative look ahead. For example this one: ^(?!.*(reference_table_name)). However this doesn't seem to work due to psql's special character limitations (e.g. on *, ?, and .) described here. I've tried to make the substitutions it suggests but it doesn't work.

Is what I'm trying to do even possible?


Solution

  • FWIW, this actually can be done with the restricted regular expressions available in this context – but whether you’d want to actually do this is another matter, because pretty it ain’t.

    Namely, to exclude the data of all tables except sometable, you have to write this:

    (|[^s]*|s(|[^o]*|o(|[^m]*|m(|[^e]*|e(|[^t]*|t(|[^a]*|a(|[^b]*|b(|[^l]*|l(|[^e]*|e?*)))))))))
    

    How does this monstrosity come about? The key to understand it is that it is a composition of copies of the following basic pattern:

    (|[^s]*|s*)
    

    This alternation will match one of the following:

    • nothing
    • any character except “s”, followed by anything (let’s call this the negative case)
    • the character “s”, followed by anything (the positive case)

    So by itself, this construct is equivalent to * and just matches anything. However, by separating the cases, it creates a hook for conditions to be added to any of them. (Any mathematicians reading this should instantly recognize this move.)

    Now let’s vary the the positive case slightly:

    (|[^e]*|e?*)
    

    Now, the positive case matches the character “e” only if followed by at least one more character. So this pattern will match almost anything: anything except the string e.

    Now we take two of these patterns, one of each:

    (|[^l]*|l*)
    (|[^e]*|e?*)
    

    … and we restrict the positive case of the first pattern by replacing its * with the second pattern, i.e. we take “an ‘l’ followed by anything” and replace “anything” with “anything except just an ‘e’”:

    (|[^l]*|l(|[^e]*|e?*))
    

    So now we have a pattern which will, in total, match one the following things:

    • nothing
    • any character except “l”, followed by anything
    • the character “l”, followed by nothing
    • the character “l”, then any character except “e”, followed by anything
    • the characters “l” then “e”, followed by at least one more character

    … which means this pattern matches any string except le.

    Now we iterate by taking another instance of the basic pattern:

    (|[^b]*|b*)
    (|[^l]*|l(|[^e]*|e?*))
    

    … and repeating the same substitution:

    (|[^b]*|b(|[^l]*|l(|[^e]*|e?*)))
    

    By the same token as previously, this matches any string except ble.

    So. Let’s start over with a whole series of copies of the same pattern:

    (|[^s]*|s*)
    (|[^o]*|o*)
    (|[^m]*|m*)
    (|[^e]*|e*)
    (|[^t]*|t*)
    (|[^a]*|a*)
    (|[^b]*|b*)
    (|[^l]*|l*)
    (|[^e]*|e?*)
    

    Folding these together yields the pattern I gave at the start – which therefore is a pattern that matches any string except sometable.

    This is how you can do it. It’s up to you whether you’d actually want to. Maybe putting a link to this answer in a comment next to the code will make it more justifiable…