I'm using GNU Bison 2.4.2 to write a grammar for a new language I'm working on and I have a question. When I specify a rule, let's say:
statement : T_CLASS T_IDENT '{' T_CLASS_MEMBERS '}' {
// create a node for the statement ...
}
If I have a variation on the rule, for instance
statement : T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST '{' T_CLASS_MEMBERS '}' {
// create a node for the statement ...
}
Where (from flex scanner rules) :
"class" return T_CLASS;
"extends" return T_EXTENDS;
[a-zA-Z\_][a-zA-Z0-9\_]* return T_IDENT;
(and T_IDENT_LIST is a rule for comma separated identifiers).
Is there any way to specify all of this only in one rule, setting somehow the "T_EXTENDS T_IDENT_LIST" as optional? I've already tried with
T_CLASS T_IDENT (T_EXTENDS T_IDENT_LIST)? '{' T_CLASS_MEMBERS '}' {
// create a node for the statement ...
}
But Bison gave me an error.
Thanks
To make a long story short, no. Bison only deals with LALR(1) grammars, which means it only uses one symbol of lookahead. What you need is something like this:
statement: T_CLASS T_IDENT extension_list '{' ...
extension_list:
| T_EXTENDS T_IDENT_LIST
;
There are other parser generators that work with more general grammars though. If memory serves, some of them support optional elements relatively directly like you're asking for.