Search code examples
databasedatabase-designlocalizationglobalizationdatabase-schema

How to support Multi-Languages approach in DataBase Schema?


I want my database to support multi Languages for all text values in its tables.

So what is the best approach to do that?.

Edit1::

E.G.

I've this "Person" table:

ID int
FirstName  nvarchar(20)
LastName   nvarchar(20)
Notes      nvarchar(max)
BirthDate  date
...........

So if i want my program to support new language "let say French".

should i add new column every time i add new language ? So my "Person" table will look like this

ID int
FirstName_en  nvarchar(20)
FirstName_fr  nvarchar(20)
LastName_en   nvarchar(20)
LastName_fr   nvarchar(20)
Notes_en      nvarchar(max)
Notes_fr      nvarchar(max)
BirthDate     date
...........

Or should i add 2 new tables one for languages and other for "Person_Languages" values ?

So this will look like : " Languages " table:

ID           int
Lang-symbol  nvarchar(4)

" Person " Table:

ID         int
BirthDate  Date

and finally " Person_Translation " table:

LangID        int
PersonID      int
Translation   nvarchar(max)

Or there is something better ??

.


Solution

  • I have had to deal with this in a questionaire database. Multiple questionaires needed to be translated in multiple languages (English, Japanese, Chinese).

    We first identified all text columns that would be printed out on the questionaires. For all these we would need to be able to store a translation. For each table having text columns that would require translations, we then created a _translations table, having a foreign key to point to the primary key of the original table, a foreign key to our language table, and then a unicode column for each text field that would require translation. In these text columns we would store the translations for each language we needed.

    So a typical query would look like:

    select     p.id
    ,          pt.product_name
    ,          pt.product_description
    from       product                  p
    inner join product_translations pt
    on         p.id = pt.product_id
    and        'fr' = pt.language_code
    

    So, always just one join extra (for each table) to obtain the translations.

    I should point out that we only had tto deal with a limited amount of tables, so it was not a big issue to maintain a few extra %_translations tables.

    We did consider adding columns for the new language, but decided against it for a coouple of reasons. First of all the number of languages to be supported was not known, but could be substantial (10, 20 languages or maybe more). Combined with the fact that most tables had at least 3 distinct human readable columns, we would have to add many, many text columns which would result in very wide rows. So we decided not to do that.

    Another approach we considered as to make one big "label" table, having the columns:

    ( table_name , id_of_table , column_name , language_id , translated_text)

    effectively having one table to store all translations anywhere in the database. We decided against that too, because it would complicate writing queries (as each 'normal' column would result in one row in the translation table, which would result in effectively joining the already large translation table multiuple times to the normal table (once for each translated column). For your example table you would get queries like this:

    select     product.id 
    ,          product_name.translated_text product_name
    ,          product_description.translated_text product_description
    from       product p
    inner join translations product_name
    on         p.id = product_name.id
    and        'product'      = product_name.table_name
    and        'product_name' = product_name.column_name
    and        'fr'           = product_name.language
    inner join translations product_description
    on         p.id = product_name.id
    and        'product'      = product_description.table_name
    and        'product_description' = product_description.column_name
    and        'fr'           = product_description.language
    

    as you can see, essentially this kind of like an entity-attribute-value design, which makes it cumbersome to query.

    Another problem of that last approach is that it would make it hard if not impossible to enforce constraints on translated text (in our case mainly unicity constraints). With a separatee table for the translations, you can easily and cleanly overcome those problems.