Search code examples
cgrammarlexical-analysisambiguityunary-operator

Unary operator ambiguity


Looking at the C grammar, it seems that the input ++i can have 2 derivation: either be treated as the prefix increment operator, or as 2 integer promotion, like +(+i) (same goes for --i).
What am I missing?

unary-expression:
   postfix-expression
   ++ unary-expression
   -- unary-expression
   unary-operator cast-expression
   sizeof unary-expression
   sizeof ( type-name )

unary-operator: one of
    & * + - ~ !

cast-expression:
    unary-expression
    ( type-name ) cast-expression

Solution

  • The lexer is using the maximal munch principle and will take as many characters as it can to form a valid token to avoid these types of ambiguity.

    We can confirm this by going to the draft C99 standard section 6.4 Lexical elements which says:

    If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token. [...]

    and it provides two examples:

    EXAMPLE 1 The program fragment 1Ex is parsed as a preprocessing number token (one that is not a valid floating or integer constant token), even though a parse as the pair of preprocessing tokens 1 and Ex might produce a valid expression (for example, if Ex were a macro defined as +1). Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid floating constant token), whether or not E is a macro name.

    and

    EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y, which violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression.