Search code examples
c++ccompiler-constructionreserved-words

At what stage of compilation are reserved identifiers reserved?


Just a little curiosity at work, here. While working on something dangerous, I got to thinking about the implementations of various compilers and their associated standard libraries. Here's the progression of my thoughts:

  1. Some classes of identifiers are reserved for implementation use in C++ and C.

  2. A compiler must perform the stages of compilation (preprocessing, compilation, linking) as though they were performed in sequence.

  3. The C preprocessor is not aware of the reserved status of identifiers.

  4. Therefore, a program may use reserved identifiers if and only if:

    1. The reserved identifiers used are all preprocessor symbols.

    2. The preprocessing result does not include reserved identifiers.

    3. The identifiers do not conflict with symbols predefined by the compiler (GNUC et. al.)

Is this valid? I'm uncertain on points 3 and 4.3. Moreover, is there a way to test it?


Solution

  • (The comments on the question explain that we're talking about reserved identifiers in the sense of C99 section 7.1.3, i.e., identifiers matching /^_[A-Z_]/ anywhere, /^_/ in file scope, /^str[a-z]/ with external linkage, etc. So here's my guess at at least a part of what you're asking...)

    They're not reserved in the sense that (any particular phase of) the compiler is expected to diagnose their misuse. Rather, they're reserved in that if you're foolish enough to (mis)use them yourself, you don't get to complain if your program stops working or stops compiling at a later date.

    We've all seen what happens when people with only a dangerous amount of knowledge look inside system headers and then write their own header guards:

    #ifndef _MYHEADER_H
    #define _MYHEADER_H
    // ...
    #endif
    

    They're invoking undefined behaviour, but nothing diagnoses this as "error: reserved identifier used by end-user code". Instead mostly they're lucky and all is well; but occasionally they collide with an identifier of interest to the implementation, and confusing things happen.

    Similarly, I often have an externally-visible function named strip() or so:

    char *strip(char *s) {
      // remove leading whitespace
      }
    

    By my reading of C99's 7.1.3, 7.26, and 7.26.11, this invokes undefined behaviour. However I have decided not to care about this. The identifier is not reserved in that anything bad is expected to happen today, but because the Standard reserves to itself the right to invent a new standard str-ip() routine in a future revision. And I've decided that I reckon string-ip, whatever that might be, is an unlikely name for a string operation to be added in the future -- so in the unlikely event that happens, I'll cross that bridge when I get to it. Technically I'm invoking undefined behaviour, but I don't expect to get bitten.

    Finally, a counter-example to your point 4:

    #include <string.h>
    #define memcpy(d,s,n)  (my_crazy_function((n), (s)))
    void foo(char *a, char *b) {
      memcpy(a, b, 5);  // intends to invoke my_crazy_function
      memmove(a, b, 5); // standard behaviour expected
    }
    

    This complies with your 4.1, 4.2, 4.3 (if I understand your intention on that last one). However, if memmove is additionally implemented as a macro (via 7.1.4/1) that is written in terms of memcpy, then you're going to be in trouble.