Search code examples
postgresqldatabase-designinternationalizationlocalecollation

Best practise for storing multilingual strings


I need to store different versions of not very long strings for different languages (2-4 languages) in a Postgres table.

What is the best way of doing that? Array or JSON or something like that?


Solution

  • First make sure that the database locale can deal with different languages. Use a UTF-8 server-encoding. Optionally, set LC_COLLATE = 'C' to be on neutral ground, or use the collation of your main language to have a default sort order. Start by reading the chapter Collation Support in the manual.

    I would strongly suggest to use the latest version of Postgres (9.1 at time of writing) for superior collation support.

    As for the table structure: keep it simple. It sounds like there is a low, fixed number of languages to deal with. You could just have a column for each language then:

    CREATE TABLE txt (
      txt_id serial PRIMARY KEY
    , txt    text NOT NULL -- master language NOT NULL?
    , txt_fr text -- others can be NULL?
    , txt_es text
    , txt_de text
    );
    

    This is pretty efficient, even with many languages. NULL storage is very cheap.
    If you have a varying number of languages to deal with, or many updates for individual language strings, a separate table might be the better solution. This solution assumes that you have a "primary language", where the string is always present:

    CREATE TABLE txt (
      txt_id serial PRIMARY KEY
    , txt    text NOT NULL -- master language NOT NULL?
    );
    
    CREATE TABLE lang (
      lang_abbr text PRIMARY KEY -- de, es, fr, ...
    , lang      text NOT NULL
    , note      text
    );
    
    CREATE TABLE txt_trans (
      txt_id    int  REFERENCES txt(txt_id) ON UPDATE CASCADE ON DELETE CASCADE
    , lang_abbr text REFERENCES lang(lang_abbr) ON UPDATE CASCADE
    , txt       text NOT NULL -- master language NOT NULL?
    , CONSTRAINT txt_trans_pkey PRIMARY KEY (txt_id, lang_abbr)
    );
    

    Not treating the master language special and keeping all language variants in the same table might make handling in your app simpler. But it really depends on requirements.