Search code examples
postgresqlconstraintsquery-optimization

Postgres caseinsensitive unique key constraint: What is the fast option?


I want to have a case-insensitive unique key on a large table.

I'm looking for the fastest approach.

I found two options: my_table_a with a generated column and my_table_b with a unique index

CREATE TABLE my_table_a
(
  my_table_id_a int GENERATED ALWAYS AS IDENTITY
    CONSTRAINT pk_my_table_a PRIMARY KEY,
  name text NOT NULL,
  key text GENERATED ALWAYS AS (LOWER(name)::text) STORED NOT NULL
);

CREATE TABLE my_table_b
(
  my_table_id_b int GENERATED ALWAYS AS IDENTITY
    CONSTRAINT pk_my_table_b PRIMARY KEY,
  name text NOT NULL
);

CREATE UNIQUE INDEX unique_name ON my_table_b (LOWER(name));

But I'm not sure what is happening on an insert. I fear that for a single insert, it is required to calculate for all rows the unique key.

What do you think is the fastest solution?


Solution

  • As demonstrated in a similar thread here, it'll be even faster if you use the lower() expression-based index, but also make it use C collation:

    create unique index on my_table_b(lower(name) collate "C");
    

    Here you can find a few benchmarks comparing citext, lower() expression based index with default collation, a plain index using a custom case-insensitive collation as well as citext with a C collation.
    You can tweak it or run your own tests but in that one, the example above won.

    If you wanted to speed anything up but generating the column, that's not necessary - it will only duplicate the work and take more space.

    You can set up unique as a constraint or as an index, but unique constraints are technically always enforced by an implicitly created index. When you add new values, Postgres tries to save them to the table as well as ingest them into the index by searching for a spot in it - if it finds that its occupied, depending on currently configured deferrability it'll raise an exception. You don't have to fear it'll ever try to seq scan the table to check that, all of this logic takes place in the index.

    Indexes, indexes later added to the table as constraints and constraints specified on the table directly have slightly different syntax, behaviour and limitations, e.g.: the constraint doesn't let you pick the collation. Hence, in this case, your best bet is the expression index with a collation, directly on the name column.