Search code examples
csscontext-free-grammarnfaautomaton

CSS "escape" syntax diagram confusion


Have a look at the syntax diagram for the escape language:

I read the second set of transitions as follows:

The state transitioned to by the \ symbol can transition to the accepting state if the next symbol is not in the newline or hex digit languages (or in other words, is in any language but the newline or hex digit languages).

The state transitioned to by the \ symbol can transition to a different state if the next symbol is in hex digit language.

Isn't this contradictory?


Solution

  • The railroad diagrams are ambiguous, not contradictory. But note the first line in that section, which says:

    This section is non-normative.

    and then goes on to explain precisely what that means: the railroad diagrams are incomplete and only informative; they are intended to give you an intuitive grasp of the syntax. The diagrams are not to be used as reference material, and they make no attempt to define the semantics of each token.

    Clearly, it is possible that more than one path through the railroad applies to a given token. But since the railroad diagram is not semantic, that doesn't matter. Moreover, many of the railroad diagrams do not tell you where the token ends; nothing in the diagram indicates that it is necessary to accept the longest possible match (which is usually the case).

    The definitive tokenisation algorithm is provided as a procedure written out in English, which is not nearly as easy to understand as the railroad diagram. Since those algorithms are normative and semantic, ambiguity would be a problem. But I think you'll find that they are all deterministic. For example, here's how to convert an escape sequence (once the initial \ has been consumed).