Search code examples
phpmysqlcharacter-encodingarabicarabic-support

Incorrect rendering of Language (e.g. Arabic)


I apologize if this question is not directly related to programming. I'm having an issue, of which I have two examples;

  1. I have a website, where I store Arabic words in a DB, and then retrieve it, and display it on a page, using php. (Here's the link to my page, that is displaying Arabic incorrectly.)

  2. I visit any random website, where the majority of the content is supposed to be in Arabic. (An example of a random website that gives me this issue.)

In both these cases, the Arabic text is displayed as 'ÇáÔíÎ: ÇáÓáÝ ãÚäÇå ÇáãÊÞÏãæä Ýßá'... or such weird characters. Do note that, in the first case, I may be able to correct it, since I control the content. So, I can set the encoding.

But what about the second case [this is where I want to apologize, since it isn't directly related to programming (the code) from my end] - what do I do for random websites I visit, where the text (Arabic) is displayed incorrectly? Any help would really be appreciated.


Solution

  • For the second case:

    This website is encoded with Windows-1256 (Arabic), however, it wrongly declares to be encoded with ISO 8859-1 (Latin/Western European). If you look at the source, you can see that it declares <meta ... charset=ISO-8859-1" /> in its header.

    So, what happens is that the server sends to your browser an HTML file that is encoded with Windows-1256, but your browser decodes this file with ISO 8859-1 (because that's what the file claims to be).

    For the ASCII characters, this is no problem as they are encoded identically in both encodings. However, not so for the Arabic characters: each code byte corresponding to an Arabic character (as encoded by Windows-1256) maps to some Latin character of the ISO 8859-1 encoding, and these garbled Latin characters are what you see in place of the Arabic text.

    If you want to display all the text of this website correctly, you can manually set the character encoding that your browser uses to decode this website.

    You can do this, for example, with Chrome by installing the Set Character Encoding extension, and then right-click on the website and select:

    Set Character Encoding > Arabic (Windows-1256)

    In Safari, you can do it simply by selecting:

    View > Text Encoding > Arabic (Windows).

    The same should be possible with other browsers, such as Firefox or Internet Explorer, too...


    For the first case:

    Your website (the HTML file that your server sends to the browser) is encoded with UTF-8. However, this HTML file doesn't contain any encoding declaration, so the browser doesn't know with which encoding this file has been encoded.

    In this case, the browser is likely to use a default encoding to decode the file, which typically is ISO 8859-1/Windows-1252 (Latin/Western European). The result is the same as in the above case: all the Arabic characters are decoded to garbled Latin characters.

    To solve this problem, you have to declare that your HTML file is encoded with UTF-8 by adding the following tag in the header of your file:

    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">