Search code examples
parsingcompiler-constructionyaccocamlyacc

Attempting to resolve shift-reduce parsing issue


I'm attempting to write a grammar for C and am having an issue that I don't quite understand. Relevant portions of the grammar:

stmt :
  types decl SEMI                { marks (A.Declare ($1, $2)) (1, 2) }
 | simp SEMI                     { marks $1 (1, 1) }
 | RETURN exp SEMI               { marks (A.Return $2) (1, 2) }
 | control                       { $1 } 
 | block                         { marks $1 (1, 1) }
 ;

 control :
   if                                                    { $1 }
  | WHILE RPAREN exp LPAREN stmt                         { marks (A.While ($3, $5)) (1, 5) }
  | FOR LPAREN simpopt SEMI exp SEMI simpopt RPAREN stmt { marks (A.For ($3, $5, $7, $9)) (1, 9) }
  ;

  if :
    IF RPAREN exp LPAREN stmt                                { marks (A.If ($3, $5, None)) (1, 5) }
  | IF RPAREN exp LPAREN stmt ELSE stmt                      { marks (A.If ($3, $5, $7)) (1, 7) }
  ;

This doesn't work. I ran ocamlyacc -v and got the following report:

83: shift/reduce conflict (shift 86, reduce 14) on ELSE
state 83
    if : IF RPAREN exp LPAREN stmt .  (14)
    if : IF RPAREN exp LPAREN stmt . ELSE stmt  (15)

    ELSE  shift 86
    IF  reduce 14
    WHILE  reduce 14
    FOR  reduce 14
    BOOL  reduce 14
    IDENT  reduce 14
    RETURN  reduce 14
    INT  reduce 14
    MAIN  reduce 14
    LBRACE  reduce 14
    RBRACE  reduce 14
    LPAREN  reduce 14

I've read that shift/reduce conflicts are due to ambiguity in the specification of the grammar, but I don't see how I can specify this in a way that isn't ambiguous?


Solution

  • The grammar is certainly ambiguous, although you know what every string means, and furthermore despite the fact that ocamlyacc reports a shift/reduce conflict, its generated grammar will also produce the correct parse for every valid input.

    The ambiguity comes from

    if ( exp1 ) if ( exp2) stmt1 else stmt2;
    

    Clearly stmt1 only executes if both exp1 and exp2 are true. But does stmt1 execute if exp1 is false, or if exp1 is true and exp2 is false? Those represent different parses; the first (invalid) parse attaches else stmt2 to if (exp1), while the parse that you, I and ocamlyacc know to be correct attaches else stmt2 to if (exp2).

    The grammar can be rewritten, although it's a bit of a nuisance. The basic idea is to divide statements into two categories: "matched" (which means that every else in the statement is matched with some if) and "unmatched" (which means that a following else would match some if in the statement. A complete statement may be unmatched, because else clauses are optional, but you can never have an unmatched statement between an if and an else, because that else must match an if in the unmatched statement.

    The following grammar is basically the one you provided, but rewritten to use bison-style single-quoted tokens, which I find more readable. I don't know if ocamlyacc handles those. (By the way, your grammar says IF RPAREN exp LPAREN... which, with the common definition of left and right parentheses, would mean if ) exp (. That's one reason I find single-quoted character terminals much more readable.)

    Bison handles this grammar with no conflicts.

     /* Fake non-terminals */
    %token types decl simp exp
     /* Keywords */
    %token ELSE FOR IF RETURN WHILE
    
    %%
    stmt: matched_stmt | unmatched_stmt ;
    stmt_list: stmt | stmt_list stmt ;
    block: '{' stmt_list '}' ;
    
    matched_stmt
     : types decl ';' 
     | simp ';'
     | RETURN exp ';'
     | block
     | matched_control
     ;
    
    simpopt : simp | /* EMPTY */;
    
    matched_control
      : IF '(' exp ')' matched_stmt ELSE matched_stmt
      | WHILE '(' exp ')' matched_stmt
      | FOR '(' simpopt ';' exp ';' simpopt ')' matched_stmt
      ;
    
    unmatched_stmt
      : IF '(' exp ')' stmt
      | IF '(' exp ')' matched_stmt ELSE unmatched_stmt
      | WHILE '(' exp ')' unmatched_stmt
      | FOR '(' simpopt ';' exp ';' simpopt ')' unmatched_stmt
      ;
    

    Personally, I'd refactor a bit. Eg:

    if_prefix  : IF '(' exp ')'
               ;
    loop_prefix: WHILE '(' exp ')'
               | FOR '(' simpopt ';' exp ';' simpopt ')'
               ;
    matched_control
               : if_prefix matched_stmt ELSE matched_stmt
               | loop_prefix matched_stmt
               ;
    unmatched_stmt
               : if_prefix stmt
               | if_prefix ELSE unmatched_stmt
               | loop_prefix unmatched_stmt
               ;
    

    A common and simpler but less rigorous solution is to use precedence declarations as suggested in the bison manual.