I'm writing an ansi-C parser in C++ with flex and bison; it's pretty complex.
The issue I'm having is a compilation error. The error is below, it's because yy_terminate
returns YY_NULL
which is defined as (an int) 0
and yylex
has the return type of yy::AnsiCParser::symbol_type
. yy_terminate();
is the automatic action for the <<EOF>>
token in scanners generated by flex. Obviously this causes a type issue.
My scanner doesn't produce any special token for the EOF, because EOF has no purpose in a C grammar. I could create a token-rule for the <<EOF>>
but if I ignore it then the scanner hangs in an infinite loop in yylex
on the YY_STATE_EOF(INITIAL)
case.
The compilation error,
ansi-c.yy.cc: In function ‘yy::AnsiCParser::symbol_type yylex(AnsiCDriver&)’:
ansi-c.yy.cc:145:17: error: could not convert ‘0’ from ‘int’ to ‘yy::AnsiCParser::symbol_type {aka yy::AnsiCParser::basic_symbol<yy::AnsiCParser::by_type>}’
ansi-c.yy.cc:938:30: note: in expansion of macro ‘YY_NULL’
ansi-c.yy.cc:1583:2: note: in expansion of macro ‘yyterminate’
Also, Bison generates this rule for my start-rule (translation_unit) and the EOF ($end).
$accept: translation_unit $end
So yylex
has to return something for the EOF or the parser will never stop waiting for input, but my grammar cannot support an EOF token. Is there a way to make Bison recognize something other then 0
for the $end
condition without modifying my grammar?
Alternatively, is there simply something I can return from the <<EOF>>
token in the scanner to satisfy the Bison $end
condition?
Normally, you would not include an explicit EOF rule in a lexical analyzer, not because it serves no purpose, but rather because the default is precisely what you want to do. (The purpose it serves is to indicate that the input is complete; otherwise, the parser would accept the valid prefix of certain invalid programs.)
Unfortunately, the C++ interfaces can defeat the simple convenience of the default EOF action, which is to return 0 (or NULL). I assume from your problem description that you have asked bison to generate a parser using complete symbols. In that case, you cannot simply return a 0 from yylex
since the parser is expecting a complete symbol, which is a more complex type than int
(Although the token which reports EOF does not normally have a semantic value, it does have a location, if you are using locaitons.) For other token types, bison will have automatically generated a function which makes an token, named something like make_FOO_TOKEN
, which you will call in your scanner action for a FOO_TOKEN
.
While the C bison parser does automatically define the end of file token (called END
), it appears that the C++ interface does not. So you need to manually define it in your %token
declaration in your bison input file:
%token END 0 "end of file"
(That defines the token type END
with an integer value of 0 and the human readable label "end of file". The value 0 is obligatory.)
Once you've done that, you can add an explicit EOF rule in your flex input file:
<<EOF>> return make_END();
If you are using locations, you'll have to give make_END
a location argument as well.