Search code examples
sqlpostgresqldatabase-designunique-index

Best practice to enforce uniqueness on column but allow some duplicates?


Here is what I am trying to figure out: there should be a table to store authorizations for our new client management system, and every authorization has their unique identifier. This constraint would be pretty easy to translate to SQL, but unfortunately because of the slowness of bureaucracy, sometimes we need to create an entry with a placeholder ID (e.g., "temp") in order for the client to be able to start taking services.

What would be the best practice to enforce this conditional uniqueness constraint?

These are what I could come up with my limited experience:

  • Use partial indexing mentioned in the PostgreSQL manual (5.3.3. -> Example 11-3.). It also mentions that This is a particularly efficient approach when there are few successful tests and many unsuccessful ones. In our legacy DB that will be migrated, there are 130,000 rows and about 5 temp authorizations a month, but the whole table only grows by about 200 rows per year. Would this be the right approach? (I am also not sure what "efficient" means in this context.)
  • Create a separate table for the temp authorizations but then it would duplicate the table structure.
  • Define a unique constraint for a group of columns. An authorization is for a specific service for a certain time period issued to an individual.

EDIT:

I'm sorry I think my description of the authorization ID was a bit obscure: it is provided by a state department with the format of NMED012345678 and it is entered by hand. It is unique, but sometimes only provided at a later time for unknown reasons.


Solution

  • There is a simple, fast and secure way:

    Add a boolean column to mark temporary entries which is NULL by default, say:

    temp bool DEFAULT NULL CHECK (temp)
    

    The added check constraint disallows FALSE, only NULL or TRUE are possible. Storage cost for the default NULL value is typically ... nothing - unless there are no other NULL values in the row.

    The column default means you don't normally have to take care of the column. It's NULL by default (which is the default default anyway, I'm just being explicit here). You only need to mark the few exceptions explicitly.

    Then create a partial unique index like:

    CREATE UNIQUE INDEX tbl_unique_id_uni ON tbl (unique_id) WHERE temp IS NULL;
    

    That only includes rows supposed to be unique. Index size is not increased at all. Be sure to add the predicate WHERE temp IS NULL to queries that are supposed to use the unique index.

    Related: