Search code examples
unicodeinternationalizationtransliteration

Romanization of Unicode text


I am looking for a way to transliterate Unicode letter characters from any language into accented Latin letters. The intent is to allow foreigners to gain insight into the pronunciation of names and words written in any non-Latin script.

Examples:

Greek:Romanize("Αλφαβητικός") returns "Alphabētikós" (or "Alfavi̱tikós")

Japanese:Romanize("しんばし") returns "shimbashi" (or "sinbasi")

Russian:Romanize("яйца Фаберже") returns "yaytsa Faberzhe" (or "jajca Faberže")

It should ideally support characters in the following scripts: CJK, Indic, Cyrillic, Semitic, and Greek. It should to be data driven and extensible, using data from either the Unicode Consortium, the USA, the EU or the UN. The code should be open source written in .NET or Java.

Does such a library exist?


Solution

  • You can use Unidecode Sharp :

    [a C#] port from Python Unidecode that itself port from Perl unidecode. (there are also PHP and Ruby implementations available)

    Usage;

    using BinaryAnalysis.UnidecodeSharp;
    
    .......................................
    
    string _Greek="Αλφαβητικός";
    MessageBox.Show(_Greek.Unidecode());
    
    string _Japan ="しんばし";
    MessageBox.Show(_Japan.Unidecode());
    
    string _Russian ="яйца Фаберже";
    MessageBox.Show(_Russian.Unidecode());
    

    I hope, it will be good for you.