Search code examples
c++c-preprocessorflex-lexer

Flex C++ - #ifdef inside flex block


I want to define constant in preprocessor which launches matching some patterns only when it's defined. Is it possible to do this, or there is the other way how to deal with this problem? i.e. simplified version of removing one-line comments in C:

%{
#define COMMENT
%}

%%
#ifdef COMMENT
[\/][\/].*$ ;
#endif

[1-9][0-9]* printf("It's a number, and it works with and without defining COMMENT");
%%

Solution

  • There is no great solution to this (very reasonable) request, but there are some possibilities.

    (F)lex start conditions

    Flex start conditions make it reasonably simple to define a few optional patterns, but they don't compose well. This solution will work best if you have only a single controlling variable, since you will have ti define a separate start condition for every possible combination of controlling variables.

    For example:

    %s NO_COMMENTS
    %%
    
    <NO_COMMENTS>"//".*     ; /* Ignore comments in `NO_COMMENTS mode. */
    

    The %s declaration means that all unmarked rules also apply to the N_COMMENTS state; you will commonly see %x ("exclusive") in examples, but that would force you to explicitly mark almost every rule.

    Once you have modified you grammar in this way, you can select the appropriate set of rules at run-time by setting the lexer's state with BEGIN(INITIAL) or BEGIN(NO_COMMENTS). (The BEGIN macro is only defined in the flex generated file, so you will want to export a function which performs one of these two actions.)

    Using cpp as a utility.

    There is no preprocessor feature in flex. It's possible that you could use a C preprocessor to preprocess your flex file before passing it to flex, but you will have to be very careful with your input file:

    1. The C preprocessor expects its input to be a sequence of valid C preprocessor tokens. Many common flex patterns will not match this assumption, because of the very different quoting rules. (For a simple example, a common pattern to recognise C comments includes the character class [^/*] which will be interpreted by the C preprocessor as containing the start of a C comment.)

    2. The flex input file is likely to have a number of lines which are valid #include directives. There is no way to avoid these directives from being expanded (other than removing them from the file). Once expanded and incorporated into the source, the header files no longer have include guards, so you will have to tell flex not to insert any #include files from its own templates. I believe that is possible, but it will be a bit fragile.

    3. The C preprocessor may expand what looks to it like a macro invocation.

    4. The C preprocessor might not preserve linear whitespace, altering the meaning of the flex scanner definition.

    m4 and other preprocessors

    It would be safer to use m4 as a preprocessor, but of course that means learning m4. ( You shouldn't need to install it because flex already depends on it. So if you have flex you also have m4.) And you will still need to be very careful with quoting sequences. M4 lets you customize these sequences, so it is more manageable than cpp. But don't copy the common idiom of defining [[ as a quote delimiter; it is very common inside regular expressions.

    Also, m4 does not insert #line directives and any non-trivial use will change the number of input lines, making error messages harder to interpret. (To say nothing of the challenge of debugging.) You can probably avoid this issue in this very simple case but the issue will reappear.

    You could also write your own simple preprocessor, but you will still need to address the above issues.