postgresql sql-order-by postgresql-12 collate partial-index

How to index a multilanguage entity in PostgreSQL

Here I am creating table product_feature_text, having a 1:N relation with table product. As the application must support several user languages, a lang_code column is added to segment english texts from other language texts.

As I want to present the product features alphabetically ordered in every language, I have created four partial indexes with their specific collate. It is expected that all products features have title in all of the four languages, i.e., there will be 25% of rows with lang_code = 'ES', for example.

This is an oversimplification of the real case but enough to depict the situation.

create table product_feature_text (
  id          bigint generated by default as identity primary key,

  -- reference to the parent product
  product_id  bigint not null,

  -- language dependent columns
  lang_code   char(2),
  title       varchar,

  foreign key (product_id) references product (id)
);

create index on product_feature_text (title collate "en-US") where lang_code = 'EN';
create index on product_feature_text (title collate "es-ES") where lang_code = 'ES';
create index on product_feature_text (title collate "fr_FR") where lang_code = 'FR';
create index on product_feature_text (title collate "de_DE") where lang_code = 'DE';

Is this the best index approach for the case?

Addendum from a comment: a typical query would be

select text
from product_feature
where product_id = 1024
   and lang_code = 'FR'
order by title collate "fr_FR"

where product_id could be anything.

Solution

It depends on the intended use of the indexes.

If you want to use them for

SELECT ... FROM product_feature_text
WHERE lang_code = 'EN' AND ...
ORDER BY title COLLATE "en-US";

your indexes might be useful.

Also, if your query looks like

WHERE product_feature_text > 'bhd'  COLLATE ...

it might help.

However, for most cases that I can envision, a single index whose collation doesn't matter would be better.

For the query in the addendum, the perfect index would be:

CREATE INDEX ON product_feature (product_id, title COLLATE "fr_FR")
   WHERE lang_code = FR';