Search code examples
javascripteclipseencodinginterpreterecmascript-5

'Source code charset' Vs 'Execution charset'


In javascript world,

I learnt that Javascript source code charset is usually UTF-8(but not always).

I learnt that Javascript (execution) charset is UTF-16.

How do I interpret these two terms?

Note: Answer can be given language agnostic-ally, by taking another language like java


Solution

  • Pretty well most source code is written in utf-8, or should be. As source code is mostly English, using ASCII compatible characters, and utf-8 is most efficient in this character range, there is a great advantage. In any case, it has become the de facto standard.

    JavaScript was developed before the rest of the world settled on utf-8, so it follows the Java practice of using utf-16 for all strings, which was pretty forward-thinking at the time. This means that all strings, whether coded in the source, or obtained some other way, will be (re-)encoded in in utf-16.

    For the most part it’s unimportant. Source code is for humans and the execution character set is for machines. However, the fact does have two minor issues:

    • JavaScript strings may waste a lot of space if your strings are largely ASCII range (which they would be in English, or even in other languages which use spaces).
    • like utf-8, utf-16 is also variable width, though most characters in most languages fit within the normal 2 bytes; however JavaScript may mis-calculate the length of a string if some of the characters extend to 4 bytes.

    Apart from questions of which encoding better suits a particular human language, there is no other advantage of one over the other. If JavaScript were developed more recently, it would probably have used utf-8 encoding for strings.