I'm learning to use flex and I've come up with a question for which I haven't found an answer (not even in the reference). Suppose I have this code:
patt1 { do_foo(42); }
patt2 { do_bar(); }
This will probably work right. The problem is, do_foo
might need to receive an argument by reference (say, an int) and do something (foo, actually) with it. The only way I can think of do_foo
reaching that variable is by declaring it as a global variable, but depending on the scope that code runs there may be another (cleaner, better) solution.
Any ideas? Any help will be greatly appreciated.
Thanks in advance.
In effect, the generated scanner looks something like this, leaving out a lot of details mostly having to do with buffer management:
int yylex() {
/* A bit of setup */
while (1) {
do {
yy_current_state = next_state(yy_current_state, get_next_char());
} while (has_no_action(yy_current_state));
yy_act = yy_accept[yy_current_state];
switch (yy_act) {
case 1: /* First action block */
break;
case 2: /* Second action block */
break;
/* etc. */
}
}
}
So it's easy to see where the actions go, and the scope they are in. For that information to be useful, you need to see the hooks which can be inserted, so let's write that out again with some explicit hooks:
YY_DECL {
/* Some declarations */
/******** Prelude block *********/
/* A bit of setup */
while (1) {
do {
yy_current_state = next_state(yy_current_state, get_next_char());
} while (has_no_action(yy_current_state));
yy_act = yy_accept[yy_current_state];
switch (yy_act) {
case 1: YY_USER_ACTION /**** User defined macro ****/
/******** First action block *********/
YY_BREAK
case 2: YY_USER_ACTION
/******** Second action block *********/
YY_BREAK
/* etc. */
}
}
}
One of the most interesting features is the "prelude block". In your (f)lex input file, it looks like this:
%option ...
%%
/* Prelude block: indented lines before the first pattern */
int locvar = 0;
patt1 { /* first action block */ }
patt2 { /* second action block */ }
The macros all have sensible defaults:
/* The default definition of YY_DECL will be different if you've
* asked for a reentrant lexer
*/
#ifndef YY_DECL
extern int yylex(void);
#define YY_DECL int yylex(void);
#endif
/* Code executed at the beginning of each rule, after yytext and yyleng
* have been set up.
*/
#ifndef YY_USER_ACTION
#define YY_USER_ACTION
#endif
/* Code executed at the end of each rule. */
#ifndef YY_BREAK
#define YY_BREAK break;
#endif
For your purposes, the most interesting of these is YY_DECL
. If you want to pass arguments to yylex
, you can modify the prototype by defining this macro. If you also need local variables during the yylex
invocation, you can declare them in the prelude block. (This is more useful for "push" lexers, but it has its uses even for normal lexers.)
The YY_USER_ACTION
and YY_BREAK
macros are even more specialized. While they both look like they might be useful for debugging, you are generally much better off using flex's built-in trace facility. The YY_USER_ACTION
macro is useful if you want to track column positions and not just line numbers; you can probably find examples of using it for this purpose. The YY_BREAK
macro can be set to nothing (rather than break
) for the case where your compiler complains about a break
following a return
statement.
Another macro, not indicated in the above code, is YY_USER_INIT
, which will be incorporated in the one-time initialization code (also not shown above, sorry).
Most of these features are documented in the flex manual. YY_DECL
is in Section 9 ("The Generated Scanner"); YY_USER_ACTION
and YY_USER_INIT
are in Section 13 ("Miscellaneous Macros") (along with some other features). (YY_BREAK
is described at the very end of that section.)
The prelude block is a Posix feature, so it is available in lex
as well, and is documented in Posix (as well as Section 5.2 of the Flex manual):
Any such input (beginning with a
<blank>
or within%{
and%}
delimiter lines) appearing at the beginning of the Rules section before any rules are specified shall be written tolex.yy.c
after the declarations of variables for theyylex()
function and before the first line of code inyylex()
. Thus, user variables local toyylex()
can be declared here, as well as application code to execute upon entry toyylex()
.