I'm very sure this answer is a duplicate one, but I can't find anything related, so... I have such a macro:
#define MACRO(x) #x
Is it allowed to do something like:
const char *var = MACRO(file.);
or
const char *var = MACRO(.ext);
or use both dots:
const char *var = MACRO(.field.);
GCC
and clang
do compile it and run it properly, but according to C standard which characters are allowed to be used in macro parameters. Are there characters that I can't use for this or any ASCII character is allowed? And which paragraph of C standard says it?
During translation of a C program, the source code is parsed into preprocessor tokens and sequences of white-space characters, and each comment is changed to a space, per C 2018 5.1.1.2. Macro replacement operates on this sequence preprocessor tokens and white-space characters, not on raw text, per C 2018 6.10.3 10.
The arguments for the macro shown in the question must be a list of preprocessor tokens and white-space characters without a bare comma (because a comma would indicate a second argument per C 2018 6.10.3 11, but the macro is defined with only one parameter).
So, to start, MACRO(,)
and MACRO(a,b)
would not be defined invocations of the macro. You also cannot include an isolated close-parenthesis in the argument because it would be taken as the end of the argument list. (You can have nested pairs of open- and close-parentheses, as in MACRO(a(b,c)d)
. Note the comma here is “protected” by the parentheses, and the first )
is taken as the close for the opening (
inside the argument, not the end of the macro arguments.)
Preprocessor tokens are defined in C 2018 6.4 1, which refers to grammatical tokens defined elsewhere in the C standard: identifier, pp-number, character-constant, string-literal, punctuator, and each non-white-space character that cannot be one of the above. (6.4 also includes header-name, but this is only used in #include
and #pragma
directives, per C 2018 6.4 4.)
So, to know what sequences of characters are defined in a macro invocation, you need to consider each of those grammatical tokens.
Note the last option in that list does not mean any character is allowed. A C implementation has some source character set, and there may be physical characters in a file that are not mapped to any character in the source character set, and a compiler could reject such characters.
Aside from such excluded characters, that last option means very many things will be accepted as sequences of preprocessor tokens and white space, because any sequence of non-excluded characters that does not form a regular preprocessing token can be taken as a “non-white-space character that cannot be one of the above.” However, specifying exactly what works is tricky. Since the preprocessing tokens include header names, character-constants, and string literals, the problematic characters mentioned above—comma and parentheses—may interact to become part of one of those tokens instead of a stand-alone character that is taken as a macro argument separator or terminator.
Another sequence for which the behavior is explicitly undefined, in C 2018 6.10.3 11, is one containing what would otherwise be a preprocessing directive, as in:
MACRO(
#define X Y
)
Additionally, when using the #
operator in a macro, the result must form a valid string literal. For example, MACRO(\)
would form "\"
, which is not a valid string literal. Also with #
, sequences of white-space characters are replaced by a single space, so MACRO(aXb)
, where X
is a sequence of spaces, tabs, and new-line characters would become just a b
.