Search code examples
clanguage-lawyerc-preprocessorundefined-behaviorc11

Why is the C preprocessor a subject of undefined behavior?


I can understand that:

  • One of the origins of the UB is a performance increase (e.g. by removing never executed code, such as if (i+1 < i) { /* never_executed_code */ }; UPD: if i is a signed integer).
  • UB can be triggered at compile time because C does not clearly distinguish between compile time and run time. The "whole language is based on the (rather unhelpful) concept of an "abstract machine" (link).

However, I cannot understand yet why C preprocessor is a subject of undefined behavior? It is known that preprocessing directives are executed at compile time.

Consider C11, 6.10.3.3 The ## operator, 3:

If the result is not a valid preprocessing token, the behavior is undefined.

Why not make it a constraint? For example:

The result shall be a valid preprocessing token.

The same question goes for all the other "the behavior is undefined" in 6.10 Preprocessing directives.


Solution

  • Why is the C preprocessor a subject of undefined behavior?

    When the C standard was created, there were some existing C preprocessors and there was some imaginary ideal C preprocessor in the minds of standardization committee members.

    So there were these gray areas, where committee members weren't completely sure what would they want to do and/or existing C preprocessor implementations differed which each other in behavior.

    So, these cases are not defined behavior. Because the C committee members are not completely sure what the behavior actually should be. So there is no requirement on what it should be.

    One of the origins of the UB

    Yes, one of.

    UB may exist to ease up implementing the language. Like for example, in case of the preprocessor, the preprocessor writers don't have to care about what happens when an invalid preprocessor token is a result of ##.

    Or UB may exist to reconcile existing implementations with different behaviors or as a point for extensions. So a preprocessor that segfaults in case of UB, a preprocessor that accepts and works in case of UB, and a preprocessor that formats your hard drive in case of UB, all can be standard conformant (but I wouldn't want to work on that one that formats your drive).