Search code examples
grammarbisonyacclex

Erratic parser. Same grammar, same input, cycles through different results. What am I missing?


I'm writing a basic parser that reads form stdin and prints results to stdout. The problem is that I'm having troubles with this grammar:

%token WORD NUM TERM

%%
stmt: /* empty */
    | word word   term { printf("[stmt]\n"); }
    | word number term { printf("[stmt]\n"); }
    | word   term
    | number term
    ;
word: WORD  { printf("[word]\n"); }
    ;
number: NUM { printf("[number]\n"); }
      ;
term: TERM { printf("[term]\n"); /* \n */}
      ;

%%

When I run the program, I and type: hello world\n The output is (as I expected) [word] [word] [term] [stmt]. So far, so good, but then if I type: hello world\n (again), I get syntax error [word][term].
When I type hello world\n (for the third time) it works, then it fails again, then it works, and so on and do forth.

Am I missing something obvious in here?

(I have some experience on hand rolled compilers, but I've not used lex/yacc et. al.)

This is the main func:

int main() {
    do {
        yyparse();
    } while(!feof(yyin));

    return 0;
}

Any help would be appreciated. Thanks!


Solution

  • Your grammar recognises a single stmt. Yacc/bison expect the grammar to describe the entire input, so after the statement is recognised, the parser waits for an end-of-input indication. But it doesn't get one, since you typed a second statement. That causes the parser to report a syntax error. But note that it has now read the first token in the second line.

    You are calling yyparse() in a loop and not stopping when you get a syntax error return value. So when you call yyparse() again, it will continue where the last one left off, which is just before the second token in the second line. What remains is just a single word, which it then correctly parses.

    What you probably should do is write your parser so that it accepts any number of statements, and perhaps so that it does not die when it hits an error. That would look something like this:

    %%
    prog: %empty
        | prog line
    line: stmt '\n'    { puts("Got a statement"); }
        | error '\n'   { yyerrok; /* Simple error recovery */ }
    ...
    

    Note that I print a message for a statement only after I know that the line was correctly parsed. That usually turns out to be less confusing. But the best solution is not use printf's, but rather to use Bison's trace facility, which is as simple as putting -t on the bison command line and setting the global variable yydebug = 1;. See Tracing your parser