Search code examples
bisonflex-lexerlex

Format of lex/flex rules - Should Pattern and Action be on the same line?


I did not find any explanation (or I missed it) on format of lex rules vis-a-vis actions. Here is an example:

  %%
  ^([ \r\t])*[abcd][^=].* 
              {  
                 return TOKEN1;
              }
  %%

As opposed to:

  %%
  ^([ \r\t])*[abcd][^=].* {  
                 return TOKEN1;
              }
  %%

I know that the %% must start on a new line without any space. However, I want to know about the action part. I find that sometimes it complains that "warning, rule cannot be matched" when the action and the pattern are on different lines as in the above example. This warning goes when they are brought in the same line. However, I have a similar rule which does not give a warning even when the action is started on a new line.

I am using with Bison though that fact should not be relevant to the question.


Solution

  • From the Flex manual:

    5.2 Format of the Rules Section

    The rules section of the flex input contains a series of rules of the form:

    pattern   action
    

    where the pattern must be unindented and the action must begin on the same line.


    If you prefer the Posix specification for lex, there is a similar requirement:

    The rules in lex source files are a table in which the left column contains regular expressions and the right column contains actions (C program fragments) to be executed when the expressions are recognized.

    ERE action
    ERE action...
    

    The extended regular expression (ERE) portion of a row shall be separated from action by one or more <blank> characters.

    <blank> is defined in the Base Definitions volume as being either a space or tab character.


    Posix prohibits rule lines where there is no action, although Flex will allow them as though the action were ;. Indented lines in the rules section are usually inserted verbatim into the output, but unless the indented lines come before the first rule, the result is undefined. At least, Posix just says that the result is undefined. Flex (and, I think, most lex implementations) copies the lines at into the generated file, where they will fall after the break; statement at the end of the action case clause. That won't be a problem if the lines are comments, which is not uncommon. But actual executable code is likely to trigger an "unreachable code" warning, assuming you compile with warnings enabled.

    However, Flex also allows start condition blocks, and inside a start condition block you're allowed to indent patterns. In that context, placing the action on a line by itself will cause Flex to treat it as a pattern rather than as inserted code.