Search code examples
c++parsingcompiler-constructionpreprocessor

How parsers handle preprocessors and conditional compilation?


I am trying to figure out how parsers handle preprocessor and conditional compilation. Using c++ as an example, are preprocessor directives included in c++ grammar rules, or is it a separate language and preprocessing happens before parsing. In both cases, how can a parser figure out errors in all possible branches and retrieve information about original code layout before preprocessing (such as number of line where the error occured)?


Solution

  • Taken from the C Preprocessor docs:

    The C preprocessor informs the C compiler of the location in your source code where each token came from.

    So in the case of GCC, the parser knows where the errors occur, because the preprocessor tells it. I am unsure whether this quotation refers to preprocessing tokens, or all C++ tokens.

    This page has a few more details on how the magic happens.

    The cpp_token structure contains line and col members. The lexer fills these in with the line and column of the first character of the token. Consequently, but maybe unexpectedly, a token from the replacement list of a macro expansion carries the location of the token within the #define directive, because cpplib expands a macro by returning pointers to the tokens in its replacement list.

    [...] This variable therefore uniquely enumerates each line in the translation unit. With some simple infrastructure, it is straight forward to map from this to the original source file and line number pair

    Here is a copy of the C++14(?) draft standard. The preprocessing grammar is in Appendix A.14. I'm not sure it matters whether you want to call it a separate language or not. Per [lex.phases] (section 2.2), C++ compilers behave as if preprocessing happens before the main translation/parsing happens.