Search code examples
parsingbisonlalrjison

Conflict in grammar: multiple actions possible


I've tried to write simple parser with jison (http://zaa.ch/jison/docs/) at stuck in describing text.

%lex

%%
[\s\n\t]+                   return 'TK_SPACE';
[0-9]+("."[0-9]+)?\b        return 'TK_NUMBER';
[a-zA-Z]+([a-zA-Z0-9]+)?\b  return 'TK_WORD';
<<EOF>>                     return 'EOF';

/lex

%start document

%%

document
  : nodes EOF
    { console.log($1); }
  | EOF
  ;

nodes
  : nodes node
    { $1.push($2); $$ = $1; }
  | node
    { $$ = [$1]; }
  ;

node
  : text
  ;

text
  : text text_element
    { $$ = $1 + $2; }
  | text_element
  ;

text_element
  : TK_NUMBER
  | TK_WORD
  | TK_SPACE
  ;

This grammar compiled with warnings.

Conflict in grammar: multiple actions possible when lookahead token is TK_SPACE in state 5
- reduce by rule: node -> text
- shift token (then go to state 9)
Conflict in grammar: multiple actions possible when lookahead token is TK_WORD in state 5
- reduce by rule: node -> text
- shift token (then go to state 8)
Conflict in grammar: multiple actions possible when lookahead token is TK_NUMBER in state 5
- reduce by rule: node -> text
- shift token (then go to state 7)

States with conflicts:
State 5
  node -> text . #lookaheads= TK_SPACE TK_WORD TK_NUMBER EOF
  text -> text .text_element #lookaheads= EOF TK_NUMBER TK_WORD TK_SPACE
  text_element -> .TK_NUMBER
  text_element -> .TK_WORD
  text_element -> .TK_SPACE

But if i try to parse text it works fine. This is not a full version of code, just version with text. I want to append nodes at node in feature.


Solution

  • The problem is your grammar is ambiguous -- a nodes consists of a sequence of one or mode node with no separators. A node is a text which consists of one or more text_element again with no separators. So there's no way to tell when one node ends and the next on begins.

    As an example, if you have a sequence of 3 text_elements in you input, it might be a single node with all 3, or it might be 3 node each with one.

    Bison will "resolve" this conflict by always preferring shift over reduce, which will always prefer making larger text objects so the rule nodes: nodes node will never be reduced, and might as well just be removed from the grammar. Since this is a pure ambiguity (not a lookahead problem) the resulting grammar matches the same language, so that may not be a problem. I assume jison (or whatever parser generator you are actually using) is the same.

    In general, however, conflicts are a problem because it means that the grammar parsed by the generated parser is not the grammar you specififed. Figuring out what grammar is actually parsed by the resulting parser is non-trivial and requires careful understanding of how shoft-reduce parsing works and the states actually generarated by the parser generator. The information is all there in the .output file (produced by bison with -v -- other generators may be different), but you need to read and understand it.