Search code examples
regexbisonyacclexbisonc++

Why the last rule matching in my lex file, when I have better rules?


I have a lex file, with my rules, such as:

PROGRAM           return Parser::PROGRAM;
PROGRAM_END       return Parser::PROGRAM_END;
VARIABLES:        return Parser::VARIABLES;
INSTRUCTIONS:     return Parser::INSTRUCTIONS; 
SKIP              return Parser::SKIP;
.           {
                std::cerr << lineno() << ": ERROR." << std::endl;
                exit(1);
            }

and when I try to use the fully compiled (with the yacc file and etc) version then on a test file only this, last rule is used, even if the test file is correct.

For example this is a test file for these rules:

PROGRAM fst
INSTRUCTIONS:
    SKIP
PROGRAM_END

For this file I only got: 1: ERROR.

Why is this, and how can I resolve this?


Solution

  • As indicated in the comments, it is almost certainly the case that PROGRAM is begin recognised as a token and passed to the parser. In almost all cases, however, the parser will immediately request another token, and the next character in the input sequence is a space, which is matched by the last rule. That rule prints an error message and calls exit(), terminating the application. (That's not generally a good idea, but I suppose this is just a test program.) So that's all the output you'll get.

    If you specify the -d command-line argument when you invoke (f)lex, then a debugging scanner will be generated which reports the progress of the scanner as it works. That's a very easy way to see what is going on in your scanner. Bison also has a debugging mode, as explained in the bison manual. These tools are very simple to use, and come highly recommended.

    Here, for example, is a quick test rig:

    %{
    #include <iostream>
    #include <cstdlib>
    class Parser {
      public:
        enum Token {
          PROGRAM = 257,
          PROGRAM_END, VARIABLES, INSTRUCTIONS, SKIP
        };
    };
    %}
    %option batch noyywrap yylineno c++
    %%
    PROGRAM           return Parser::PROGRAM;
    PROGRAM_END       return Parser::PROGRAM_END;
    VARIABLES:        return Parser::VARIABLES;
    INSTRUCTIONS:     return Parser::INSTRUCTIONS; 
    SKIP              return Parser::SKIP;
    .                 {
                        std::cerr << lineno() << ": ERROR." << std::endl;
                        exit(1);
                      }
    %%
    int main() {
      yyFlexLexer lexer{};
      lexer.set_debug(1);
      while(lexer.yylex() != 0) { }
      return 0;
    }
    

    And a sample run:

    $ g++ lex.yy.cc && ./a.out<<<"PROGRAM fst"
    --(end of buffer or a NUL)
    --accepting rule at line 14("PROGRAM")
    --accepting rule at line 19(" ")
    1: ERROR.
    

    which makes it clear that the scanner did first produce the PROGRAM token, before exiting on the space character.