Search code examples
javascriptencodingutf-8diacritics

How to store an array with strings containing letters like á ö ő í in a javascript file?


I've got an Array in a javascript file that contains Hungarian first names with with special letters like: ö ő ó á í.

var names = ["Aba", "Abád", "Abbás", "Abdiás", "Abdon", "Ábel", "Abelárd"];

(The above is just a shortened array, the whole length is around 3000 item long)

When I want to output the content of "names" in a HTML document I got muffled letters for the non ASCII chars.

If I define the array straight in the UTF-8 encoded HTML where it is outputted I got a clear output list. Where as if I define the array in a JavaScript file I got a muffled content. See the screen: http://screencast.com/t/YJ83K9Mgm

I detected (Notepad++) that the JavaScript file is in ANSI coding.

QUESTION: how can I store the name array (or code containing this special letters in general) so that I can output it in the browsers properly.

(Actually I am using MS Studio Express 2012 for coding. I could not find a place where I can set the coding type of certain files.)

HERE IS THE SIMPLIFIED CODE WIDTH THE ARRAY DEFINED IN THE HTML HEADER:

<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="utf-8" />
    <title>Name List Trial</title>
    <script src="nevekdata.js"></script>
    <script>
        // These are Hungarian first names just a few of them, the whole array is around 1400 long
        // including letters like ö ő ó á stb.
        // !!!!!!!!
        // if I difine  the "names" Array here, the array list is written out in the browser width the
        // special letter seen correctly.        
        var names = ["Aba", "Abád", "Abbás", "Abdiás", "Abdon", "Abdullah", "Ábel", "Abelárd"];
        // if I put it into a javascript file "nevekdata.js" I get muffled chars instead of the correct letters
        function writeOutNames() {
            outputnames.innerHTML = names.toString();
        }
    </script>
</head>
<body>
    <button onclick="writeOutNames()">Write Out names</button>
    <p></p>
    <p id="outputnames"></p>

</body>
</html>

Solution

  • You already said it yourself, the file is saved in ANSI, but then you serve it as UTF-8. This causes browser to treat your ANSI encoded file as UTF-8.

    The charset parameters and headers are just a hint to the browser of what encoding your files are in, it doesn't actually do anything to the "physical" bytes of the file. For this all to work, you need the charset parameter and headers AND encode your file physically to UTF-8 bytes.

    You need to encode the file as UTF-8.. in notepad++, save the file as UTF-8 without BOM.