Looking at the C grammar, it seems that the input ++i
can have 2 derivation: either be treated as the prefix increment operator, or as 2 integer promotion, like +(+i)
(same goes for --i
).
What am I missing?
unary-expression:
postfix-expression
++ unary-expression
-- unary-expression
unary-operator cast-expression
sizeof unary-expression
sizeof ( type-name )
unary-operator: one of
& * + - ~ !
cast-expression:
unary-expression
( type-name ) cast-expression
The lexer is using the maximal munch principle and will take as many characters as it can to form a valid token to avoid these types of ambiguity.
We can confirm this by going to the draft C99 standard section 6.4
Lexical elements which says:
If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token. [...]
and it provides two examples:
EXAMPLE 1 The program fragment 1Ex is parsed as a preprocessing number token (one that is not a valid floating or integer constant token), even though a parse as the pair of preprocessing tokens 1 and Ex might produce a valid expression (for example, if Ex were a macro defined as +1). Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid floating constant token), whether or not E is a macro name.
and
EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y, which violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression.