Search code examples
c++cc-preprocessorstring-literals

Implementation of string literal concatenation in C and C++


AFAIK, this question applies equally to C and C++

Step 6 of the "translation phases" specified in the C standard (5.1.1.2 in the draft C99 standard) states that adjacent string literals have to be concatenated into a single literal. I.e.

printf("helloworld.c" ": %d: Hello "
       "world\n", 10);

Is equivalent (syntactically) to:

printf("helloworld.c: %d: Hello world\n", 10);

However, the standard doesn't seem to specify which part of the compiler has to handle this - should it be the preprocessor (cpp) or the compiler itself. Some online research tells me that this function is generally expected to be performed by the preprocessor (source #1, source #2, and there are more), which makes sense.

However, running cpp in Linux shows that cpp doesn't do it:

eliben@eliben-desktop:~/test$ cat cpptest.c 
int a = 5;

"string 1" "string 2"
"string 3"

eliben@eliben-desktop:~/test$ cpp cpptest.c 
# 1 "cpptest.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "cpptest.c"
int a = 5;

"string 1" "string 2"
"string 3"

So, my question is: where should this feature of the language be handled, in the preprocessor or the compiler itself?

Perhaps there's no single good answer. Heuristic answers based on experience, known compilers, and general good engineering practice will be appreciated.


P.S. If you're wondering why I care about this... I'm trying to figure out whether my Python based C parser should handle string literal concatenation (which it doesn't do, at the moment), or leave it to cpp which it assumes runs before it.


Solution

  • The standard doesn't specify a preprocessor vs. a compiler, it just specifies the phases of translation you already noted. Traditionally, phases 1 through 4 were in the preprocessor, Phases 5 though 7 in the compiler, and phase 8 the linker -- but none of that is required by the standard.