Search code examples
postgresqlsql-order-bypostgresql-12collatepartial-index

How to index a multilanguage entity in PostgreSQL


Here I am creating table product_feature_text, having a 1:N relation with table product. As the application must support several user languages, a lang_code column is added to segment english texts from other language texts.

As I want to present the product features alphabetically ordered in every language, I have created four partial indexes with their specific collate. It is expected that all products features have title in all of the four languages, i.e., there will be 25% of rows with lang_code = 'ES', for example.

This is an oversimplification of the real case but enough to depict the situation.

create table product_feature_text (
  id          bigint generated by default as identity primary key,

  -- reference to the parent product
  product_id  bigint not null,

  -- language dependent columns
  lang_code   char(2),
  title       varchar,

  foreign key (product_id) references product (id)
);

create index on product_feature_text (title collate "en-US") where lang_code = 'EN';
create index on product_feature_text (title collate "es-ES") where lang_code = 'ES';
create index on product_feature_text (title collate "fr_FR") where lang_code = 'FR';
create index on product_feature_text (title collate "de_DE") where lang_code = 'DE';

Is this the best index approach for the case?

Addendum from a comment: a typical query would be

select text
from product_feature
where product_id = 1024
   and lang_code = 'FR'
order by title collate "fr_FR"

where product_id could be anything.


Solution

  • It depends on the intended use of the indexes.

    If you want to use them for

    SELECT ... FROM product_feature_text
    WHERE lang_code = 'EN' AND ...
    ORDER BY title COLLATE "en-US";
    

    your indexes might be useful.

    Also, if your query looks like

    WHERE product_feature_text > 'bhd'  COLLATE ...
    

    it might help.

    However, for most cases that I can envision, a single index whose collation doesn't matter would be better.

    For the query in the addendum, the perfect index would be:

    CREATE INDEX ON product_feature (product_id, title COLLATE "fr_FR")
       WHERE lang_code = FR';