Passing an argument to yylex

I'm trying to alter my .l and .y files to make my scanner and parser reentrant. Following the GNU documentation et al, I've put this in my .y file:

%define api.pure
%lex-param { YYSTYPE *yylval }
%parse-param { astNodePtr *programTree }

In addition, I've added the reentrant and bison-bridge options to my .l file.

However, after I build parser.tab.h and parser.tab.c with bison -dv parser.y, I notice that parser.tab.c contains the declaration

int yylex(void);

and yet later involves calls such as

yychar = yylex (&yylval, yylval);

Furthermore, attempting to compile the .c file which flex creates generates all sorts of errors centered on some variable called yyg.

Are there additional flags/whatever I need to add to either my .l or .y file?

Solution

Remember that flex and bison are two completely separate processors which take two completely separate input files and generate two completely separate programs, which are then compiled separately. Bison and flex do not read each other's input files (or output files), nor do they read the programmer's mind. So any interoperability between the generated programs is there because you, as the programmer, arranged for it.

For default (not reentrant) generated scanners and parsers, you arrange for interoperability by

Making sure that the header generated by bison is #included in the scanner generated by flex, and
Declaring the prototype of yylex in the parser generated by bison.

The prototype of yylex is normally

int yylex(void);

You can get flex to generate yylex with a different prototype, or even a different name, but if you do that you need to tell bison how to call yylex (since it has no way to know that you've altered the prototype in your scanner).

If you find that line in your parser.tab.c file, it is almost certainly because you inserted it in a code block in your parser.y file, because that would have been necessary in order to compile the non-reentrant parser, and forgot to remove it when you modified the file to produce a reentrant parser.

By default, the parser and the lexer intercommunicate using the global variable yylval to hold the semantic value of the lexical token. (And the global variable yylloc if locations are also being communicated.) Bison defines these global variables in the generated parser and declares them in the generated header, so as long as the generated header is #included into the generated scanner, the two programs can interoperate.

But that's really not a good way to share data, and modern coding styles frown on such uses of global variables. By generating a reentrant parser, you can enable an alternative mechanism, in which the parser passes the scanner pointers to its own local yylval (and yylloc, if used). That solves the reentrancy issue, but now you need to communicate that desire to flex, so that will generate a yylex which expects those arguments. The way you do that is to insert %bison-bridge (and possibly %bison-locations) into your flex input file.

Doing that will adjust the calling convention for yylex, but it does not actually produce a reentrant scanner. It just produces a scanner which does not rely on globals to communicate with the parser. The scanner relies on lots of other globals to maintain its own state. If you want a reentrant scanner, you also need to insert the %reentrant declaration into the flex file, which will cause it to generate a scanner which keeps its "global" state in a context object of opaque type yyscan_t. That context object has to be passed to yylex as an argument (which comes at the end of yylex's argument list). The %reentrant flex declaration produces a yylex which expects that argument, but now bison is out of the loop; once again, it becomes your responsibility to communicate that fact to bison.

And it is also becomes your responsibility to allocate a yyscan_t object which can be passed to the lexer. But you don't call the scanner directly. You call the parser (yyparse) and it calls the scanner when necessary.

Flex allows you to add arbitrary code into the scanner (by placing it, indented, before the first rule). But, unfortunately, bison has no such facility. The only way you can inject a new variable into the parser is by adding it to the yyparse argument list (using the %parse-param declaration). So you need to create the yyscan_t object yourself, and pass it to yyparse. Then you need to tell bison to use that object when it calls yylex, which you do with the %lex-param declaration.

It should be reasonably clear that it will almost always be the case that the %parse-param and %lex-param declarations mentioned above will be identical. The only way for the parameter to be passed through yyparse to yylex is if the parameter added with %parse-param has the same name as the argument added with %lex-param. Since that is the case, bison very sensibly allows you to combine %parse-param and %lex-param into a single %param declaration.

Now you only have one small problem. The parameter you need to pass through yyparse into yylex has the opaque type yyscan_t. Since that parameter is a parameter of yyparse, the type yyscan_t needs to be visible in the generated bison header. However, yylex is called (and thus must be declared) with other parameters, of type YYSTYPE* and YYLTYPE*, and these types are declared in the bison-generated header. Flex can also generate a header, but that won't help you because there is a circular header dependency between the generated files. (That's a natural consequence of the type coupling between supposedly independent scanners and parsers. But I won't develop that critique any further because it's basically a given.)

There are two workarounds. The best one, IMHO, is to avoid the circular header dependency by using a push parser instead of a pull parser. In the push parser model, the parser is called by the scanner (or called by the driver which passes the token produced by the scanner) instead of calling the scanner. That's the model I always recommend.

The other one is to resolve the circular dependency by manually declaring yyscan_t:

typedef void* yyscan_t;

For a fully worked out (and documented) example, see this answer