Search code examples
htmlutf-8character-encodingmeta

How to make a .htm page accept letters of languages other than english?


Currently I am working on an application which converts a .msg file to pdf. I am using a pdf converter which converts html to pdf file.So, I convert the email to html and then use the tool to convert it to pdf. Everything was working fine until I tried to convert a french email to pdf. When I open the .htm file for the french email with notepad++ , it displays the french accent letters(é, à, ù, ê, ë, ....) fine, but when I open it in browser, the french accent letters are changed to some strange symbols.When,I added the "meta http-equiv="content-type" content="text/html;charset=utf-8"tag to the html.It started showing the french letters correctly. So, will this "meta" tag make the html work for all possible french letters.Or only selective ones? Also is there any tag which can make the html accept letters from any language? Thanks in advance.


Solution

  • Computers deal in binary data. Under the hood, all the characters (letters, numbers, punctuation, etc) in an HTML (or other kind of text) document are just groups of 1s and 0s as far as the computer is concerned.

    Which characters those groups of 1s and 0s represent depend on the choice of character encoding.

    Unicode encodings, including UTF-8, can represent just about any human language.

    If the document is actually encoded in UTF-8 and you tell the browser then it is encoded in UTF-8 then you are highly unlikely to run into characters that you can't represent.

    For further reading, start with Character encodings: Essential concepts