We have an enterprise web application where every bit of text in the system is localised to the user's browser's culture setting.
So far we have only supported English, American (similar but mis-spelt ;-) and French (for the Canadian Gov't - app in English or French depending on user preference). During development we also had some European languages in mind like Dutch and German that tend to concatenate words into very long ones.
We're currently investigating support for eastern languages: Chinese, Japanese, and so on. I understand that these use phonetic input converted to written characters. How does that work on the web? Do the same events fire while inputs and textareas are being edited (we're quite Ajax heavy).
What conventions do users of these top-down languages expect online?
What effect does their dual-input (phonetic typing + conversion) have on web controls?
With RTL languages like Arabic do users expect the entire interface to be mirrored? For instance should things like OK/Cancel buttons be swapped and on the left?
Read Globalization Step-by-Step by Microsoft.
I can answer the specifics on CJKV, but you probably want a book on this topic. I haven't read it but CJKV Information Processing is from O'Reilly (2nd ed due Dec, 2008).
I understand that these use phonetic input converted to written characters. How does that work on the web?
The input is done by a class of software called an IME (Input Method Editor) on Windows, Mac, and Linux (e.g. SCIM). When an IME is turned on, the input from the keyboard first goes to the IME, and the user gets to pick the correct kanji/hiragana combo. When the user commits by hitting return key, the IME types in the kanji/hiragana into the web browser using the current encoding. Encoding situation was a big mess, but if you are writing a web app, go with an encoding of Unicode. I suggest UTF-8.
Do the same events fire while inputs and textareas are being edited?
A Unicode savvy web browser and OS combo handles multiple languages. For example, one can use English normal version of Firefox to browse and post to a Japanese website. From the browsers point of view, it's just an array of "bla bla bla" in Unicode. In other words, if the event fires up in English, the same event should fire up in CJKV if you use a Unicode variant.
What conventions do users of these top-down languages expect online?
CJKV readers expect left-to-right online. Math and science textbooks are written from left-to-right. Most word processors, including localized version of Word, write left-to-right.
What effect does their dual-input (phonetic typing + conversion) have on web controls?
For the most part you should not have to worry about it, unless you are trapping keyboard events. For example, I hate using Japanese keyboard with bunch of extra keyboard. So, when I have to assign IME on/off command to some key on US keyboard. I personally use right-Alt. Also, spacebar and enter key is used during conversion, but not sure if these events are passed to browser.