Search code examples
c++multithreadingbisonflex-lexerreentrancy

Reentrant Bison/Flex, how to get error message for each instance of yyscan_t


I'm trying to create a program that uses multithreading with flex/bison to parse large amounts of data. I am slightly lost on how to get yyerrorin a reentrant way.

In a previous non-reentrant test with bison/flex I used extern to get yyerror

extern void yyerror(const char*);

void yyerror(const char* msg) {
    std::cout << " Error: " + std::string(msg) << std::endl;
    ...
    calling appropriate code to handle error etc
    ...
}

Now I'm trying to implement this using reentrant bison and flex.

Using the example code from the user @rici Thread-safe / reentrant bison + flex, I'm trying to get understand how I'd be able to get the error message after yyparse is called. How could I implement the following?


class container {

public:

bool errorOccured;
std::string errorMessage;

void parse() {
    yyscan_t scanner;
    yylex_init(&scanner);
    yy_scan_string("123 + + 123 \n", scanner);
    yyparse(scanner);
    yylex_destroy(scanner);
    //errorOccured = ?;
    //errorMessage = ?;
}

bool checkIfErrorOccured() {
    std::cout << errorMessage << std::endl;
    return errorOccured;
}

}

Thread-safe / reentrant bison + flex For reference here is the lex code I am using, written by the user @rici

%option noinput nounput noyywrap 8bit nodefault                                 
%option yylineno
%option reentrant bison-bridge bison-locations                                  

%{
  #include <stdlib.h>                                                           
  #include <string.h>
  #include "parser.tab.h"                                                   

  #define YY_USER_ACTION                                             \
    yylloc->first_line = yylloc->last_line;                          \
    yylloc->first_column = yylloc->last_column;                      \
    if (yylloc->last_line == yylineno)                               \
      yylloc->last_column += yyleng;                                 \
    else {                                                           \
      yylloc->last_line = yylineno;                                  \
      yylloc->last_column = yytext + yyleng - strrchr(yytext, '\n'); \
    }
%}                                                                              
%%
[ \t]+            ;                                                  
#.*               ;                                                  

[[:digit:]]+      *yylval = strtol(yytext, NULL, 0); return NUMBER;  

.|\n              return *yytext;

bison

%define api.pure full
%define parse.error verbose
%locations
%param { yyscan_t scanner }

%code top {
  #include <stdio.h>
  #include <string.h>
} 
%code requires {
  typedef void* yyscan_t;
}
%code {
  int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp, yyscan_t scanner);
  void yyerror(YYLTYPE* yyllocp, yyscan_t unused, const char* msg);
}

%token NUMBER UNOP
%left '+' '-'
%left '*' '/' '%'
%precedence UNOP
%%
input: %empty
     | input expr '\n'      { printf("[%d]: %d\n", @2.first_line, $2); }
     | input '\n'
     | input error '\n'     { yyerrok; }
expr : NUMBER
     | '(' expr ')'         { $$ = $2; }
     | '-' expr %prec UNOP  { $$ = -$2; }
     | expr '+' expr        { $$ = $1 + $3; }
     | expr '-' expr        { $$ = $1 - $3; }
     | expr '*' expr        { $$ = $1 * $3; }
     | expr '/' expr        { $$ = $1 / $3; }
     | expr '%' expr        { $$ = $1 % $3; }

%%

void yyerror(YYLTYPE* yyllocp, yyscan_t unused, const char* msg) {
  fprintf(stderr, "[%d:%d]: %s\n",
                  yyllocp->first_line, yyllocp->first_column, msg);
}



Solution

  • If a parse fails, yyparse returns a non-zero value. That's the same for both reentrant and non-reentrant parsers. So you should always collect the return value from yyparse:

    status = yy_parse(scanner);
    

    If you're doing error recovery (that is, you have one or more error productions), then you will have to keep a count of errors yourself. The yyparse error return only occurs if error recovery fails (or if there is a memory allocation error).

    yyerror is called when an error is detected (before error recovery is tried). In toy examples, it usually just prints its argument to stderr. (In the default configuration, the argument is "syntax error", but you can get better error messages with %define parse.error verbose.) In a production parser with error recovery, yyerror might do nothing, leaving it to the error recovery procedure to try to produce a more meaningful error message. Or it might store bison's error message somewhere for future reference.

    There's no huge problem with printing to stderr since the yyerror call executes synchronously in the same thread as the parser (bison is completely thread-unaware). But some applications prefer to put the messages into some kind of data structure for later processing. (You definitely will want to consider this in a multithreaded application.) To facilitate that, as you can see in my code, yyerror is called with the same additional parameters as yyparse.

    In the sample code, this feature was not used (which is why the scanner_t argument is called unused). But since flex allows you to extend the scanner context object with extra data, that would be a sensible place to put an error collector, so it will prove useful that yyerror has access to it. (It's also available in any parser action, of course, since it's a parameter of yyparse.)

    Maybe it's confusing that I put the yyerror definition in the scanner file, rather than the parser file. Since it's an external function, it doesn't much matter which translation unit it goes in. Putting it in the parser is probably what you will mostly see in examples, but it also makes a lot of sense to define it in the translation unit which calls the parser.

    Putting it in the scanner is at best eccentric. I did that solely to avoid the hassle of the circular dependency issue which I describe in some detail in the linked answer, so I won't repeat that here.

    Circular dependency is not going to be a problem in any translation unit other than the code generated by bison. If you want to use the extra-data technique mentioned above, you'll want to ask flex to generate a header file and make sure you #include that header in the file where yyerror is defined.