Consider the following HTML:
<!DOCTYPE html>
<html>
<body>
<script>
const a = " ... ";
for (let i = 0; i < a.length; ++i) {
console.log(a.charCodeAt(i));
}
</script>
</body>
</html>
Where the ...
in the string is actually the ASCII characters NUL
(0), SOH
(1), STX
(2). This file is saved as UTF-8 (the only valid HTML5 encoding).
When I open it in Firefox or Chrome it prints this:
32
65533
1
2
32
However according to my reading of the spec, I should be able to store a null byte:
StringLiteral ::
" DoubleStringCharactersopt "
' SingleStringCharactersopt '
DoubleStringCharacters ::
DoubleStringCharacter DoubleStringCharactersopt
DoubleStringCharacter ::
SourceCharacter but not one of " or \ or LineTerminator
<LS>
<PS>
\ EscapeSequence
LineContinuation
SourceCharacter ::
any Unicode code point
and
All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in ECMAScript source text where permitted by the ECMAScript grammars.
So why won't it let me store a null byte?
(Yes I am aware of all the implications, please don't tell me that I shouldn't want to do this.)
Edit: to be clear the string is not " \x00\x01\x02 "
. It is this:
If you move the Javascript to an external .js
file then it does work fine, so this is a limitation of HTML, not Javascript.
Apparently HTML parsers will emit an unexpected-null-character
error and either ignore it or replace it with U+FFFD.
I believe the relevant state is Script data state which explicitly calls out null bytes as being disallowed.