Search code examples
c++cc-preprocessorstringification

Stringize operator failure


The C and C++ standards all include text to the effect that if a stringize operation fails to produce a valid string literal token, the behavior is undefined. In C++11 this is actually possible, by including a newline character in a raw string literal. But the catch-all has always been there in the standards.

Is there any other way that stringize can produce UB, where UB or an ill-formed program hasn't already happened?

I'd be interested to hear about any dialect of C or C++ whatsoever. I'm writing a preprocessor.


Solution

  • The stringify (#) operator only escapes \ in string constants. Indeed, \ has no particular significance outside of a string constant, except at the end of a line. It is, therefore, a preprocessing token (C section 6.4, C++ section 2.5).

    Consequently, if we have

    #define Q(X) #X
    

    then

    Q(\)
    

    is a legitimate call: the \ is a preprocessing token which is never converted to a token, so it's valid. But you can't stringify \; that would give you "\" which is not a valid string literal. Hence, the behaviour of the above is undefined.

    Here's a more amusing test case:

    #define Q(A) #A
    #define ESCAPE(c) Q(\c)
    const char* new_line=ESCAPE(n);
    const char* undefined_behaviour=ESCAPE(x);
    

    A less interesting case of an undefined stringify is where the stringified parameter would be too long to be a string literal. (The standards recommend that the maximum size of a string literal be at least 65536 characters, but say nothing about the maximum size of a macro argument, which could presumably be larger.)