Search code examples
phpnlparabicstanford-nlp

Processing arabic text for transliteration


I used http://www.ar-php.org/en_index-php-arabic.html library for Arabic to english and English to arabic transliteration.

For simple English or Arabic text copied from web it work fine.

But for English text which is written using robert_bold , robert_regular_0 fonts, which looks like:

Words

When I convert it, it gives me unsupported text like :

ال ‘؟ س[
كير[ ’[ ت
شو ’\ ن
به ’; س
؟ م[ن
س ال@اناه

When I convert simple English text, it gives all supported Arabic characters.

I am not native Arabic country residence.

Any suggestion to improve my system will appreciable.


Solution

  • I believe your problem lies in encoding of your text in this 'robert_bold' font. It does seem to use some other characters then the standard, so you will need to add those characters to your transliteration library as well.

    Look at one of the words you mentioned - Shu'un. The second 'u' letter in the picture has a line above it. So, its outside of normal range of characters, and as such - there is no transliteration for it in that library.