Search code examples
c++c-preprocessorstandardsc++14

Sign in Preprocessing Number


In Section 2.10 in the C++ Standard 14 ([lex.ppnumber]), preprocessing numbers are defined as

pp-number
    digit
    . digit
    pp-number digit
    pp-number ' digit
    pp-number ' nondigit
    pp-number identifier-nondigit
    pp-number e sign
    pp-number E sign
    pp-number .

So this should include all integer literal tokens and all floating literal tokens. But as written in 2.14.4 ([lex.fcon]), there a sign is optional i.e. (if there is a way to format it as in the standard, feel free to improve).

exponent-part:
    e sign_opt digit-sequence
    E sign_opt digit-sequence
sign: one of
    + -

Why is the sign in the pp-number definition not optional? In fact the way it is written, the number 1e3 should be valid as floating-literal, but not as a pp-number, which contradicts the explanation given below section 2.10.

Is there something I do not get?


Solution

  • Quoting from here:

    A preprocessing number has a rather bizarre definition. The category includes all the normal integer and floating point constants one expects of C, but also a number of other things one might not initially recognize as a number. Formally, preprocessing numbers begin with an optional period, a required decimal digit, and then continue with any sequence of letters, digits, underscores, periods, and exponents. Exponents are the two-character sequences ‘e+’, ‘e-’, ‘E+’, ‘E-’, ‘p+’, ‘p-’, ‘P+’, and ‘P-’. (The exponents that begin with ‘p’ or ‘P’ are new to C99. They are used for hexadecimal floating-point constants.)

    The purpose of this unusual definition is to isolate the preprocessor from the full complexity of numeric constants. It does not have to distinguish between lexically valid and invalid floating-point numbers, which is complicated. The definition also permits you to split an identifier at any position and get exactly two tokens, which can then be pasted back together with the ‘##’ operator.

    It's possible for preprocessing numbers to cause programs to be misinterpreted. For example, 0xE+12 is a preprocessing number which does not translate to any valid numeric constant, therefore a syntax error. It does not mean 0xE + 12, which is what you might have intended.