Search code examples
flex-lexerlexical-analysisscanning

There are formatting rules to follow when using flex?


I don't get why, of 2 functionally identical source files, only 1 passes the compilation phase with flex and the other generates errors about the use of undeclared identifier .

This one is ok ( I don't usually use tabs in my editor, those are all whitespaces )

        int num_lines = 0, num_chars = 0;

%%
\n      ++num_lines; ++num_chars;
.       ++num_chars;

%%
int main()
        {
        yylex();
        printf( "# of lines = %d, # of chars = %d\n",
                num_lines, num_chars );
        }

This one it's not accepted by flex and doesn't generate anything but errors

int num_lines = 0, num_chars = 0;

%%
\n  ++num_lines; ++num_chars;
.   ++num_chars;

%%

int main()
{
    yylex();
    printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars );
}

Do I have to follow some specific convention if I want to compile my scanner with flex ?


Solution

  • Yes, there are formatting rules in lex/flex and you are violating them.

    I'll summarise. There are three main sections of the lex/flex input program which are separated by the %% delimiter in column one (at the start of a line). The last section is optional. The first section are for lexical declarations; in this section regular expressions can be named. The second section specifies actions to be performed on patterns and the third (optional) section is used for (C) code that is to be transcribed to the output file. It is used to define functions used in the action section.

    The standard format for the first (lex declaration) section is:

    name     pattern
    

    Where the name must start in column one (start of line) and the pattern is separated on the same line by white space.

    The format for the second (action) section is similar:

    pattern   action
    

    Where the pattern must start in column one (start of line) and the action is separated on the same line by white space. The pattern can be continued on more than one line, but must be indented by white space otherwise it will be interpreted as a new pattern.

    The third section has no layout restrictions as the code is just skipped.

    There is one final syntactic feature that is useful. In the first section code that does not specify a lexical pattern which should be copied to the output can be indicated by a %{ and %} at the start of a line. Further, in the action (second) section any code with no pattern and just an action is copied to the output.

    Starting your file with a declaration of variables in C violates these rules. If it starts of the left it is treated as a lexical definition.

    If you want to declare some variables in C which should be copied to the output, you can do it in the following manner:

    %{
    int num_lines = 0, num_chars = 0;
    %}
    %%
    \n      ++num_lines; ++num_chars;
    .       ++num_chars;
    

    Or alternately, like this:

    %%
            int num_lines = 0, num_chars = 0;
    \n      ++num_lines; ++num_chars;
    .       ++num_chars;