Search code examples
parsingsyntaxbisonbnf

Bison shift/reduce conflict for do block


I'm having trouble solving a shift/reduce conflict.

I'm trying to write a while loop syntax:

while expression do code() end

The problem lies in the do keyword.

One of my expressions is a call expression that also accepts an optional do block for callbacks, e.g:

function_call() do print("callback") end

That in turns seems to cause the shift/reduce conflict in the while loop's do keyword.

So if i do:

while call() do stuff() end

It instead tries to match the do with the function call and messes up the entire parsing.

Ruby has a very similar syntax, but it seems to correctly favor the do keyword for the while loop rather than the expression. in Ruby you can put the callback in parenthesis if needed, which is ideally how I'd want to solve this too:

while (call() do stuff() end) do more_stuff() end

How do I get over this? I messed around a lot with operator precedence definitions but nothing seemed to work, how would you solve this? How did ruby manage to do it right?

Here's a complete grammar that reproduces the issue:

%token IDENT DO END WHILE

%%

program:
       %empty
       |
       stmts
       ;

stmts:
     stmt
     |
     stmts ';' stmt
     |
     stmts ';'
     ;

opt_stmts:
         %empty
         |
         stmts
         ;

opt_semi:
        %empty
        |
        ';'
        ;

term:
    ';'
    |
    DO opt_semi
    ;

while_loop:
          WHILE expr term opt_stmts END
          |
          WHILE term opt_stmts END
          ;

stmt:
    expr
    |
    while_loop
    ;

expr:
    IDENT
    |
    call
    |
    '(' expr ')'
    ;

do_block:
        DO '|' args '|' opt_stmts END
        |
        DO opt_stmts END
        ;

call:
    IDENT '(' args ')'
    |
    IDENT '(' args ')' do_block
    |
    IDENT do_block
    ;

args:
    %empty
    |
    expr
    |
    args ',' expr
    |
    args ','
    ;

%%

Side note: I'm using Jison which is a bison clone for JavaScript as I'm writing a compiler for JavaScript, however that shouldn't be a problem as this is a general grammar issue and the above minimal snippet I wrote has been ran through original Bison as well.


Solution

  • You've identified the problem -- the ambiguity of a DO that might be part of a WHILE..DO or might be part of a call-DO. This is particularly hard, as the DO is optional in both cases, but the syntax is such that some DOs might only associate with a WHILE or a call depending on various things after them, which means that the non-ambiguous cases can't be identified without more lookahead anyways. It also runs into some LALR vs LR issues that also make it tough (and also mean that you can't use precedence to solve it.)

    You can solve it by refactoring the grammar in much the same way as one does for the dangling else problem -- spliting the expr rule and only allowing expressions that don't end with a call-DO in a WHILE statement. That way, a DO after a WHILE (and not in parens) will always be associated with the WHILE and not a call. It does slightly change the language -- a WHILE..DO with the optional ';' present with a call-DO in between (without parens) will be rejected as a syntax error (the first DO will match with the WHILE and the second DO can't be a call-DO as the DO is immediately followed by ';'.

    For your example, this means splitting the expr rule into call-DO case and all other expressions case:

    expr: non_call_do_expr | call_withdo ;
    non_call_do_expr: IDENT | call_nodo | '(' expr ')' ;
    
    
    call_nodo: IDENT '(' args ')' ;
    
    call_withdo:
        IDENT '(' args ')' do_block |
        IDENT do_block ;
    

    if you add any additional expr rules they need to be added to the non_call_do_expr rule unless they (can) end with a call-DO. In which case you need two versions for the two cases, exactly as is needed for statements to resolve the dangling-else case. The while then becomes:

    while_loop:
              WHILE non_call_do_expr term opt_stmts END |
              WHILE term opt_stmts END ;