Search code examples
compiler-construction

How does GCC/Clang buffer preprocessed code?


While implementing a C compiler, what would be the best way to buffer the preprocessed code, and how does GCC/Clang do it? Do they write the output to a separate file or do they just buffer it in memory?


Solution

  • The result of preprocessing is a sequence of tokens, not a character string.

    Different compilers deal with the incoming token queue in different ways. The last time I looked, GCC's C compiler generates the queue more or less on demand, although a macro expansion will enqueue multiple tokens. The C++ compiler, however, tokenises more aggressively --my memory is that it tokenises the entire TU, but I could be remembering incorrectly-- because it sometimes requires arbitrary lookahead.

    Whichever strategy you use, it's important to understand the difference between a character stream and a token stream, and to avoid tokenising twice. That's not just an efficiency; it's a way of avoiding subtle preprocessing bugs. See old versions of Visual C for abundant examples.