I intended to use bison to parse some scripting language, in this language I can write code like the following:
a = input()
b = a + 1
function myfunc
a = input()
b = a + 1
end function
I found that the block
a = input()
b = a + 1
which appear both in and out of the function definition can be reduced by the same rule stmts
, so I write code like the following
%{
#include <stdio.h>
#include <string>
#include <sstream>
#include <iostream>
#include <stdarg.h>
#include <tuple>
using namespace std;
%}
%debug
%token CRLF EXP FUNCTIONBEGIN FUNCTIONEND
%start program
%%
stmt : EXP
|
stmts : stmt CRLF stmts
| stmt
function : FUNCTIONBEGIN CRLF stmts CRLF FUNCTIONEND
unit : function
| stmts
program : unit
| unit CRLF program
%%
Of course this code can't work due to one shift/reduce conflict
State 3
3 stmts: stmt . CRLF stmts
4 | stmt .
CRLF shift, and go to state 9
CRLF [reduce using rule 4 (stmts)]
$default reduce using rule 4 (stmts)
I thought this conflict is due to both my program
rule and stmts
rule using the same terminal CRLF
as a "binary operator", so I can't solve this conflict by set priority to operators.
Maybe I can merge program
rule and stmts
rule together by somehow adding another two rules to stmt
stmts : function CRLF stmts
| function
However I thought this method(whether it can practically work) is not very beautiful, so I ask if there's some other solutions
The problem has nothing to do with CRLF tokens. Rather, it is your definition of program
. A program
is a series of unit
s where each unit is a function
or a stmts
. But stmts
is not a "unit", which is hinted at by the fact that its name is plural. A stmts
is a series of stmt
s.
So suppose we have a program consisting of three statements. How many unit
s is that? Is it one stmts
consisting of all three statements? Or two of them, one with two statements and the other with just one? Or the other way around? Or even three unit
s, each consisting of a stmts
containing a single statement?
The parser can't tell which of those alternatives is desired because the grammar is ambiguous. And that is what creates the conflict.
The simplest solution is to change the production unit: stmts
to be singular: unit: stmt
. Then there is no ambiguity; the three-statement program
has exactly three unit
s, each a single stmt
.
By the way, you should always prefer left recursion when writing LR grammars. Right recursion doesn't usually create conflicts, and it has nothing to do with your current problem, but it does lead to unnecessarily Iarge parsing stacks, and the reduction of lists like units
and stmts
will execute from right-to-left as the components are popped off the stack, which is often not what is intended.