Search code examples
bisonflex-lexer

Why does bison not match this grammar rule?


I have these lex tokens:

,[Yy] {
    return COMMAY;
}

(?xi:
    ADC|AND|ASL|BCC|BCS|BEQ|BIT|BMI|BNE|BPL|BRK|
    BVC|BVS|CLC|CLD|CLI|CLV|CMP|CPX|CPY|DEC|DEX|
    DEY|EOR|INC|INX|INY|JMP|JSR|LDA|LDX|LDY|LSR|
    NOP|ORA|PHA|PHP|PLA|PLP|ROL|ROR|RTI|RTS|SBC|
    SEC|SED|SEI|STA|STX|STY|TAX|TAY|TSX|TXA|TXS|
    TYA
) {
    yylval.str = strdup(yytext);
    for(char *ptr = yylval.str; *ptr = tolower(*ptr); *ptr++);

    return MNEMONIC;
}

\$[0-9a-fA-F]{2} {
    yylval.str = strdup(yytext);
    return ZEROPAGE;
}

[a-zA-Z_][a-zA-Z0-9_]+ {
    yylval.str = strdup(yytext);
    return IDENTIFIER;
}

And these grammar rules in bison:

expression:
    MNEMONIC '(' zp_identifier ')' COMMAY   { statement($1, $3,     "(zp),y"); }
|   MNEMONIC '(' zp_abs_identifier ')'      { statement($1, $3,     "(zp)"); }
;

zp_identifier:
    ZEROPAGE
|   IDENTIFIER
;

zp_abs_identifier:
    ZEROPAGE
|   ABSOLUTE
|   IDENTIFIER
;

And for some reason this piece of code:

; absolute indirect
lda ($9000)
lda (steps) 

; zp indirect
lda ($90)
lda (gold)

generates a syntax error at lda (steps). I would say this matches the MNEMONIC '(' zp_abs_identifier ')' rule, but it looks like it tries looking for the COMMAY part of the rule above (and fails).

When I remove the rule with COMMAY, this part is parsed correctly.


Solution

  • I've plugged your lexicon and grammar into lex and bison and generated the resulting parsers. When generating the grammar with bison, bison showed that the grammar had 2 reduce/reduce conflicts.

    Technical meaning:

    Shift/Reduce parsers are table-driven and have two actions: shift and reduce. These actions are defined by a table which takes into account the current state and the lookahead token. Bison is a tool that produces these tables automatically.

    A conflict occurs when the parser can choose from multiple actions in a single state. There are two types:

    Normally, Shift/Reduce conflicts are auto-resolved by defaulting to the shift actions. This works often though not always. Thus shift/reduce conflict is not neccessary an issue.

    However, a Reduce/Reduce conflict is a critical issue, there is no proper way to resolve such a conflict. Thus if you see these conflicts you can expect unexpected behaviour.

    Why is there a reduce/reduce conflict?

    The conflict occurs because the parser needs to decide which production rule it must explore. Once the rule is chosen it will not go back. As the production rules have a common part, the parser is smart enough to parse the common part first and decide later on what the production rule should be.

    However, to parse the identifier it must decide to either produce a zp_identifier or zp_abs_identifier nonterminal. As both nonterminals work, it guesses some nonterminal. In this case it defaults to choosing the zp_identifier, because this nonterminal is chosen it forces the production rule to being the first rule.

    This is why it expects COMMAY.

    How to solve this:

    The issue is caused due to lack of lookahead or by using a parsing algorithm that is not strong enough.

    A simple solution would be using the GLR version of bison: https://www.gnu.org/software/bison/manual/html_node/GLR-Parsers.html, This makes the parser explore ambiguous choices separately which will result in both identifiers being explored, later the branch with the wrong identifier is destroyed keeping only the valid branch. Note that if the grammar is inherently ambiguous even GLR cannot save you, grammar reconstruction is the only tool leftover in this case.

    Another method is to remove the reduce/reduce conflict via grammar reconstruction. This is a rather complex method, which is guaranteed to make your grammar less logical. In this case, the issue is caused by the common parts in zp_abs_identifier and zp_identifier.

    I suggest taking out the special case of ABSOLUTE, and making the general case with an optional COMMAY terminal:

    expression:
        MNEMONIC LEFT_PARENTHESIS ABSOLUTE RIGHT_PARENTHESIS
        | MNEMONIC LEFT_PARENTHESIS zp_identifier RIGHT_PARENTHESIS optional_commay
        ;
    
    optional_commay:
        |
        COMMAY
        ;
    
    zp_identifier:
        ZEROPAGE
        | IDENTIFIER
        ;
    

    This way there are no reduce/reduce conflicts, and the special case is still solved.