Search code examples
cparsingbisonyacccompiler-compiler

Bison Shift/Reduce Conflict for a programming language grammar


I'm writing a programming language parser and I'm stuck in this Shift/Reduce Conflict.

Here's the conflict state in the parser.output file obtained via running bison with -v

State 1

   24 ident: TIDENT .
   26 call: TIDENT . TLPAREN args TRPAREN

    TLPAREN  shift, and go to state 24

    TLPAREN   [reduce using rule 24 (ident)]
    $default  reduce using rule 24 (ident)

The conflict is happening when I'm trying to implement a rule for call, it seems to conflict with the normal ident rule.

Here's some parts of the grammar, (actions removed for simplicity but they shouldn't be needed. also I'm not sure if the order in which rules are defined matters, correct me if I'm wrong)

(capital letters are tokens)

The ident rule is simply

ident: TIDENT
          ;

Args, used by call.

args: /* empty */
        |
        expr
        |
        args TCOMMA expr
        ;

Call for calling a function

call:
       TIDENT TLPAREN args TRPAREN
       ;

Expr for expressions

expr:
    number
    |
    ternary
    |
    bool
    |
    string
    |
    ident
    |
    call
    |
    TLPAREN expr TRPAREN
    |
    expr TPLUS expr
    |
    expr TMINUS expr
    |
    expr TSLASH expr
    |
    expr TSTAR expr
    |
    expr TGT expr
    |
    expr TGE expr
    | 
    expr TLT expr
    |
    expr TLE expr
    ;

The question: why does the grammar have a shift/reduce conflict and how do you fix it? I've seen similar style parsers that do not have the conflicts its really weird.

If you need to see the full grammar for reproducing here's a hastebin https://hasteb.in/zozifopi.shell

If you need more details about anything else then please let me know in the comments and I'll edit the question accordingly.


Solution

  • The fundamental problem here is that your grammar is ambiguous because statements don't need to be terminated (stmts: stmts stmt) and a statement can be an expression. So two expressions can appear one after another without any punctuation. That means that f(3) could be one expression (a function call) or two expressions (f and (3)).

    If you're happy for the parser to always interpret that as a function call, (which is its default behaviour, since it prefers to shift), then you could just add a couple of precedence declarations, so that the call has higher precedence than the reduction:

    %precedence TIDENT
    //...
    %precedence TLPAREN
    // ...
    %%
    expr : ident %prec TIDENT
    

    That just papers over the ambiguity, and may cause surprising parses. But the only other solution is to make the language unambiguous.