Search code examples
phputf-8internationalizationgettextutf-16

What should I know to make my I18N application work in Japanese?


I'm working on a I18N application which will be located in Japanese, I don't know any word in Japanese, and I'm first wondering if utf8 is enough for that language.

Usually, for European language, utf8 is enough, and I've to set up my database charset/collation to use utf8_general_ci (in MySQL) and my html views in utf8, and it's enough.

But what about Japanese, is there something else to do?

By the way my application would be able to handle English, French, Japanese, but later on, it may be needed to add some languages, let's say, Russian.

How could I set up my I18N application to be available widely without having to change much configurations on deployment?

Is there any best practices?

By the way, I'm planning to use gettext, I'm pretty sure it supports such languages without any problems as it is the de facto standard for almost all GNU softwares, but any feedback?


Solution

  • A couple of points:

    • UTF-8 is fine for your app-internal data, but if you need to process user-supplied documents (e.g. uploads), those may use other encodings like Shift-JIS or ISO-2022-JP
    • Japanese text does not use whitespace between words. If your app needs to split text into words somewhere, you've got a problem.
    • Apart from text, date and number formats differ
    • The generic collation may not lead to a useful sort order for Japanese text - if your app involves large lists that people have to find things in, this can be a problem.