Search code examples
javascripthtmlutf-8

Storing null bytes in Javascript string literals


Consider the following HTML:

<!DOCTYPE html>
<html>
    <body>
        <script>
            const a = " ... ";

            for (let i = 0; i < a.length; ++i) {
                console.log(a.charCodeAt(i));
            }
        </script>
    </body>
</html>

Where the ... in the string is actually the ASCII characters NUL (0), SOH (1), STX (2). This file is saved as UTF-8 (the only valid HTML5 encoding).

When I open it in Firefox or Chrome it prints this:

32
65533
1
2
32

However according to my reading of the spec, I should be able to store a null byte:

StringLiteral ::
    " DoubleStringCharactersopt "
    ' SingleStringCharactersopt '

DoubleStringCharacters ::
    DoubleStringCharacter DoubleStringCharactersopt

DoubleStringCharacter ::
    SourceCharacter but not one of " or \ or LineTerminator
    <LS>
    <PS>
    \ EscapeSequence
    LineContinuation

SourceCharacter ::
    any Unicode code point

and

All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in ECMAScript source text where permitted by the ECMAScript grammars.

So why won't it let me store a null byte?

(Yes I am aware of all the implications, please don't tell me that I shouldn't want to do this.)

Edit: to be clear the string is not " \x00\x01\x02 ". It is this:

evil string


Solution

  • If you move the Javascript to an external .js file then it does work fine, so this is a limitation of HTML, not Javascript.

    Apparently HTML parsers will emit an unexpected-null-character error and either ignore it or replace it with U+FFFD.

    I believe the relevant state is Script data state which explicitly calls out null bytes as being disallowed.