Restricting preprocessing-numbers in a C preprocessor to only handle valid floating and integer constants

I'm currently implementing a C11 compiler and I'm aiming to integrate the preprocessor into the rest compiler and not have it as a stand-alone component. As such, the preprocessor can safely assume that its output will be valid in the following stages.

Reading about the preprocessing number token, it seems like it only exists to simplify the implementation of a stand-alone preprocessor. Simplifying the format of numbers, it doesn't have to handle the full complexity of numeral expressions. Quoting the GCC docs:

The purpose of this unusual definition is to isolate the preprocessor from the full complexity of numeric constants. It does not have to distinguish between lexically valid and invalid floating-point numbers, which is complicated.

As the preprocessor will be integrated to the rest of the compiler framework, this is not an issue for me.

In section 6.4.8.4 [Preprocessing numbers; Semantics] of the C11 standard, it claims

A preprocessing number does not have type or a value; it acquires both after a successful conversion (as part of translation phase 7) to a floating constant token or an integer constant token.

So it seems like every preprocessing-number will be converted into a floating or integer constant later on in the compilation process. I cannot find any other references to preprocessing-numbers in the standard, so it seems like this is their only purpose, but I may be wrong.

My question is, would it be valid for the preprocessor to restrict preprocessing-numbers to only valid integer and floating point constants? Or are there cases where having such a restriction would cause otherwise valid programs to fail?

Solution

There are certainly valid programs which include pp-numbers not convertible to an integer or float. The common case is a preprocessing token which does not become a token.

For example, it might be stringified:


#define STRINGIFY_(X) #X
#define STRINGIFY(V)  STRINGIFY_(V)
#define VERSION 3.4.6a
#define PROGNAME foo

int main(void) {
  printf("%s-%s\n", STRINGIFY(PROGNAME), STRINGIFY(VERSION));
}

Moreover, the version number in the above example could have been produced with token concatenation, another way preprocessing tokens never become program tokens:


#include <stdio.h>
#define STRINGIFY_(X) #X
#define STRINGIFY(V)  STRINGIFY_(V)
#define CONCAT3_(x,y,z) x##y##z
#define CONCAT3(x,y,z) CONCAT3_(x,y,z)
#define CONCAT_V(mj, mn, pl) CONCAT3(mj, ., CONCAT3(mn, ., pl))

#define MAJOR 3
#define MINOR 4
#define PATCH 6a

#define VERSION CONCAT_V(MAJOR, MINOR, PATCH)
#define PROGNAME foo

int main(void) {
  printf("%s-%s\n", STRINGIFY(PROGNAME), STRINGIFY(VERSION));
}

There are other ways for a pp-number (or any other preprocessing token) to never be converted to a token:

As the argument to a macro which does not use the corresponding parameter in its replacement text.
In program text in a preprocessor conditional whose controlling expression is false.

This is often used "in the wild" by to hide not-completely written code inside an #if 0 … #endif block; the excluded code may have almost arbitrary syntax errors, as long as comments and strings are terminated, included invalid pp-numbers and even stray punctuation. (@ is a valid preprocessing token which cannot be converted to a token.)