Search code examples
postgresqlforeign-keyspostgresql-9.2corruption

Cause of PostgreSQL foreign key violation?


My PostgreSQL (9.2) database contains two tables registrations and attributes with a foreign key constraint:

postgres=# \d+ registrations;
                Table "public.registrations"
 Column  | Type  | Modifiers | Storage  | Stats target | Description
---------+-------+-----------+----------+--------------+-------------
 name    | text  | not null  | extended |              |
 parent  | text  |           | extended |              |
 storage | bytea |           | extended |              |
Indexes:
    "registrations_pkey" PRIMARY KEY, btree (name)
Referenced by:
    TABLE "attributes" CONSTRAINT "attributes_cname_fkey" FOREIGN KEY (cname) REFERENCES registrations(name) ON DELETE CASCADE
Has OIDs: no

postgres=# \d+ attributes;
                 Table "public.attributes"
 Column | Type  | Modifiers | Storage  | Stats target | Description
--------+-------+-----------+----------+--------------+-------------
 cname  | text  | not null  | extended |              |
 aname  | text  | not null  | extended |              |
 tags   | text  |           | extended |              |
 value  | bytea |           | extended |              |
Indexes:
    "attributes_pkey" PRIMARY KEY, btree (cname, aname)
Foreign-key constraints:
    "attributes_cname_fkey" FOREIGN KEY (cname) REFERENCES registrations(name) ON DELETE CASCADE
Has OIDs: no

At some point I realised that some rows violated the foreign key constraint:

postgres=# SELECT COUNT(*) FROM attributes LEFT JOIN registrations ON attributes.cname=registrations.name WHERE registrations.name IS NULL;
 count
-------
    71
(1 row)

Could you help me understand how this corruption could happen?


Solution

  • A constraint marked as NOT VALID is the one case you might expect to see violations, but the NOT VALID clause would show up in the psql \d+ output. (I believe it's possible to manually update this flag in the catalog, but I hope for your sake that this isn't the issue...)

    As far as I know, the only supported way of bypassing a foreign key check is to SET session_replication_role TO replica before modifying the data. This is there for the benefit of replication processes, operating under the assumption that the constraint has already been validated on the master - though this can certainly go wrong if your replicator is buggy or misconfigured.

    It's also possible for a superuser to manually disable the constraint's underlying triggers (and it's often tempting for someone trying to speed up a bulk import). The following will tell you if the triggers are currently active (tgenabled should be 'O'):

    SELECT *
    FROM pg_trigger
    WHERE tgname ~ '^RI_ConstraintTrigger'
      AND tgrelid IN ('registrations'::regclass, 'attributes'::regclass)
    

    I don't think there's any way of knowing whether this was temporarily changed in the past, though if you have statement logging enabled, you might find an ALTER TABLE ... DISABLE TRIGGER statement in there somewhere.

    There is also at least one loophole in the foreign key enforcement, and of course, it's always possible that you've found a bug...