Search code examples
parsingyaccshift-reduce-conflictgppg

Help with Shift/Reduce conflict - Trying to model (X A)* (X B)*


Im trying to model the EBNF expression

("declare" "namespace" ";")* ("declare" "variable" ";")*

I have built up the yacc (Im using MPPG) grammar, which seems to represent this, but it fails to match my test expression.

The test case i'm trying to match is

declare variable;

The Token stream from the lexer is

KW_Declare
KW_Variable
Separator

The grammar parse says there is a "Shift/Reduce conflict, state 6 on KW_Declare". I have attempted to solve this with "%left PrologHeaderList PrologBodyList", but neither solution works.

Program                     : Prolog;
Prolog                      : PrologHeaderList PrologBodyList;

PrologHeaderList            : /*EMPTY*/
                            | PrologHeaderList PrologHeader;
PrologHeader                : KW_Declare KW_Namespace Separator;

PrologBodyList              : /*EMPTY*/
                            | PrologBodyList PrologBody;
PrologBody                  : KW_Declare KW_Variable Separator;

KW_Declare KW_Namespace KW_Variable Separator are all tokens with values "declare", "naemsapce", "variable", ";".


Solution

  • It's been a long time since I've used anything yacc-like, but here are a couple of suggestions that may or may not help.

    It seems that you need a 2-token lookahead in this situation. The parser gets to the last PrologHeader, and it has to decide whether the next construct is a PrologHeader or a PrologBody, and it can't tell that from the KW_Declare. If there's a directive to increase lookahead in this situation, it will probably solve the problem.

    You could also introduce context into your actions: rather than define PrologHeaderList and PrologBodyList, define PrologRuleList and have the actions throw an error if a header appears after a body. Ugly, but sometimes you have to do it: what appears simple in a grammar may not be simple in the generated parser.

    A hackish approach might be to combine the tokens: rather than KW_Declare and KW_Variable, have your lexer recognize the space and use KW_Declare_Variable. Since both are keywords, you're not going to run into namespace collision problems.