Search code examples
phpruby-characters

PHP read Japanese character, transform Japanese kanji into readable form


I'm not sure if PHP is capable of this but,

I've got Japanese kanji characters 『漢字』being displayed. I'd like php (or some language) to read this character and display how to read it (either in katakana「かんじ」or romaji「kanji」)

This way I will be able to display characters like this.

kanji
かんじ
漢字

Basically, add furigana to kanji (how to read the character).


Solution

  • This is not a trivial problem.

    Consider the problems that will arise from verb conjugations (送りがな) as well as 音読み and 訓読み. How does PHP know the difference in reading between '食' in '食事' and '食べる'?

    You need a morphological analyzer for this, such as mecab.

    If you install mecab on your server, you can call it from php via exec.

    $key='漢字';
    $phonetic=exec( 'echo '.$key.' | mecab  -O yomi');
    

    *note that yomi allows for phonetic reading to be displayed in katakana

    To prevent encoding issues, you might want to run something like putenv('LANG=en_US.UTF-8'); prior to the exec so that the stdout is not garbled when stored in a variable in php.

    Even something like mecab cannot give you 100% accuracy due to the complex nature of Japanese sentences.