Search code examples
javascriptecmascript-6ecmascript-2017ecma

ECMAScript 2017: EscapeSequence in StringLiteral


The below excerpts refer to ECMAScript 2017.

10.1 Source Text, Syntax

Escape sequences, like \u000A, will not be interpreted as line terminators (i.e. new lines):

In string literals, regular expression literals, template literals and identifiers, any Unicode code point may also be expressed using Unicode escape sequences that explicitly express a code point's numeric value. Within a comment, such an escape sequence is effectively ignored as part of the comment.

ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences.

If the Unicode escape sequence \u000A occurs within a string literal in a Java program, it is interpreted as a line terminator, which is not allowed within a string literal.

A Unicode escape sequence occurring within a string literal in an ECMAScript program, always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.

11.8.4 String Literals

Code points may appear as escape sequences in string literals, except reverse solidus (\).

A string literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), U+2028 (LINE SEPARATOR), U+2029 (PARAGRAPH SEPARATOR), and U+000A (LINE FEED). Any code points may appear in the form of an escape sequence.

Questions

  1. How can an escape sequence occur inside a string literal, if \ is not allowed (11.8.4)?
  2. 11.8.4. states that code points may be represented as escape sequences. 10.1 states that escape sequence \u000A inside a string literal is not interpreted as a line terminator. These two seem contradictory. If it is not interpreted as a line break inside the string literal, then how is it interpreted (if at all)?

Solution

  • How can an escape sequence occur inside a string literal, if \ is not allowed (11.8.4)?

    I think the key part of that section is "appear literally", which is saying that a \ in the string literal does not translate into a backslash in the resulting string itself. It's not saying backslashes are disallowed, it is saying they don't "appear literally".

    10.1 states that escape sequence \uu000A inside a string literal is not interpreted as a line terminator.

    You skipped the earlier part of that quote "always contributes to the literal". \u000A is perfectly allowed, and does get added to the content of the string. That code is saying that it isn't treated as a line terminator in the sense of the lexical grammar. It is saying that

    var foo = "one\u000Atwo";
    

    is allowed even though

    var foo = "one
    two";
    

    is a syntax error. Both try to use a newline codepoint between words, but the first is allowed because it isn't actually treated as a line-terminator from the standpoint of the lexer.