Search code examples
sqlpostgresqlvalidationindexingdatabase-design

How to enforce uniqueness across two columns within the scope of a third?


Given a table with three columns:

CREATE TABLE tbl
  column_scope text
, col1 int
, col2 int
);

I want to enforce, partitioned by column_scope, that values in col1, col2, are unique across both columns - except for duplicates within the same row. That is, within the scope, no two values in col1 and col2 can be the same, except for NULL or for them to be the same row.

I tried adding an unique index (column_scope, col1, col2), however this would fail since it wouldn't take into consideration invalid cases such as:

row_1 -> (scope_1, x, NULL);
row_2 -> (scope_1, NULL, x);

or

row_1 -> (scope_1, x, y);
row_2 -> (scope_1, y, z);

I also tried adding a constraint to the table, however I can't seem to get the conditions right for assuring that col1 and col2 don't have shared values for a given column_scope.

This is different from other questions as I don't care about the particular combination of values, I only care that values are not repeated among different columns, all within the scope of the third column.


Solution

  • Buckle up. This is an advanced solution.
    Install two additional modules first (once per database): intarray and btree_gist. Then we can make this work with a single exclusion constraint:

    CREATE EXTENSION intarray;    -- required!
    CREATE EXTENSION btree_gist;  -- required!
    
    CREATE TABLE tbl (
      column_scope text NOT NULL
    , col1 int  -- can be null
    , col2 int  -- can be null
    );
    
    -- Add THIS exclusion constraint !!!
    ALTER TABLE tbl ADD CONSTRAINT tbl_cross_col_unique
    EXCLUDE USING gist (column_scope WITH =
                      , array_remove(ARRAY[col1, col2], null) gist__int_ops WITH &&);
    

    fiddle

    Basically, the exclusion constraint disallows to "overlap" arrays built from col1 and col2 for the same column_scope.

    Note how I assume type integer for col1 & col2. Keeps it simple. For other types, you need to do more ...

    You want to allow null values in col1 and col2. So exclude those from the exclusion constraint. Lucky coincident - null values wouldn't be allowed in the generated array anyway.
    The manual on array_remove():

    Comparisons are done using IS NOT DISTINCT FROM semantics, so it is possible to remove NULLs.

    It all happens to fall into place.

    There are many intricate details to this, concerning data types, index method, operator class, null values, index optimization, ... But that's beyond the scope of a simple question and enters the realm of paid consulting.

    Closely related case with more details:

    Alternatives

    Alternatively, you could merge col1 and col2 into a single column of a separate table in a 1:n relationship (and add a tag to define the type). Either replacing what you have now, or additionally, just to enforce your constraint. Then, a plain UNIQUE constraint can be applied. See:

    Simpler case: just disallow composite dupes with switched values

    This expression index implements a UNIQUE index where (1,2) and (2,1) for (col1, col2) are considered equal:

    CREATE UNIQUE INDEX tbl_uni_idx ON tbl
       (column_scope, GREATEST(col1, col2), LEAST(col1, col2));
    

    See:

    Null values are not considered equal unless you add the NULLS NOT DISTINCT clause - which requires Postgres 15 or later. See: