I am writing a compiler in C, and I use bison for the grammar and flex for the tokens. To improve the quality of error messages, some common errors need to appear in the grammar. This has the side effect, however, of bison thinking that an invalid input is actually valid.
For example, consider this grammar:
program
: command ';' program
| command ';'
| command {yyerror("Missing ;.");} // wrong input
;
command
: INC
| DEC
;
where INC
and DEC
are tokens and program
is the initial symbol. In this case, INC;
is a valid program, but INC
is not, and an error message is generated. The function yyparse()
, however, returns 0 as if the program were correct.
Looking at the bison manual, I found the macro YYERROR
, which should behave as if the parser itself found an error. But even if I add YYERROR
after the call to yyerror()
, the function yyparse()
still returns 0. I could use YYABORT
instead, but that would stop on the first error, which is terrible and not what I want.
Is there anyway to make yyparse()
return 1 without stopping on the first error?
Since you intend to recover from syntax errors, you're not going to be able to use the return code from yyparse
to signal that one or more errors occurred. Instead, you'll have to track that information yourself.
The traditional way to do that would be to use a global error count (just showing the crucial pieces):
%{
int parse_error_count = 0;
%}
%%
program: statement { yyerror("Missing semicolon");
++parse_error_count; }
%%
int parse_interface() {
parse_error_count = 0;
int status = yyparse();
if (status) return status; /* Might have run out of memory */
if (parse_error_count) return 3; /* yyparse returns 0, 1 or 2 */
return 0;
}
A more modern solution is to define an additional "out" parameter to yyparse:
%parse-param { int* error_count }
%%
program: statement { yyerror("Missing semicolon");
++*error_count; }
%%
int main() {
int error_count = 0;
int status = yyparse(&error_count);
if (status || error_count) { /* handle error */ }
Finally, in case you really need to export the symbol yyparse
from your bison-generated code, you can do the following ugly hack:
%code top {
#define yyparse internal_yyparse
}
%parse-param { int* error_count }
%%
program: statement { yyerror("Missing semicolon");
++*error_count; }
%%
#undef yyparse
int yyparse() {
int error_count = 0;
int status = internal_yyparse(&error_count);
// Whatever you want to do as a summary
return status ? status : error_count ? 1 : 0;
}