Search code examples
parsingbisonbisonc++

Trouble With Precedence


I'm trying to generate a parser for a relatively simple grammar, but I'm having trouble getting my precedence to work correctly.

I've managed to get down to 15 shift/reduce errors, but I'm at a loss how to fix these remaining few - and they're probably related to my precedence issues.

I have a grammar defined as such:

%nonassoc T_LEN T_OPENPAREN T_CLOSEPAREN T_NUM T_ID T_QUOTE T_TRUE T_FALSE
%left T_GT T_GTE T_LT T_LTE T_EQ
%left T_OR
%left T_AND
%left T_NOT

Start           : Expression                                    { astRoot = new MainNode( $1 ); }
                ;

Expression      : Expression T_OR Expression                    { $$ = new OrNode( $1, $3 ); }
                | Expression T_AND Expression                   { $$ = new AndNode( $1, $3 ); }
                | Expression Expression                         { $$ = new AndNode( $1, $2 ); }
                | T_NOT Expression                              { $$ = new NotNode( $2 ); }
                | T_GT Expression                               { $$ = new GreaterNode( $2 ); }
                | T_GTE Expression                              { $$ = new GreaterEqualNode( $2 ); }
                | T_LT Expression                               { $$ = new LessNode( $2 ); }
                | T_LTE Expression                              { $$ = new LessEqualNode( $2 ); }
                | T_EQ Expression                               { $$ = new EqualNode( $2 ); }
                | T_LEN T_OPENPAREN T_NUM T_CLOSEPAREN          { $$ = new LenNode( new IntegerNode( $3 ) ); }
                | T_OPENPAREN Expression T_CLOSEPAREN           { $$ = $2; }
                | T_TRUE                                        { $$ = new BoolTrueNode; }
                | T_FALSE                                       { $$ = new BoolFalseNode; }
                | T_ID                                          { $$ = new IdentifierNode( $1 ); }
                | T_NUM                                         { $$ = new IntegerNode( $1 ); }
                | T_QUOTE                                       { $$ = new QuoteNode( new IdentifierNode( $1 ) ); }
                ;

When trying to parse the following I don't get the values I want:

100 T_AND 200 T_AND 300 T_AND 400 T_OR 1000
output => And( "100", And( "200", And( "300", Or( "400", "1000" ) ) ) );
expect => Or( And( "100", And( "200", And( "300", "400" ) ) ), "1000" );

Finally the relevant parts of my parser output is:

State 27 conflicts: 15 shift/reduce

0 $accept: Start $end

1 Start:       Expression

2 Expression:  Expression T_OR Expression
3            | Expression T_AND Expression
4            | Expression Expression
5            | T_NOT Expression
6            | T_GT Expression
7            | T_GTE Expression
8            | T_LT Expression
9            | T_LTE Expression
10           | T_EQ Expression
11           | T_LEN T_OPENPAREN T_NUM T_CLOSEPAREN
12           | T_OPENPAREN Expression T_CLOSEPAREN
13           | T_TRUE
14           | T_FALSE
15           | T_ID
16           | T_NUM
17           | T_QUOTE

State 27

    2 Expression: Expression . T_OR Expression
    3           | Expression . T_AND Expression
    4           | Expression . Expression
    4           | Expression Expression .

    T_LEN        shift, and go to state 1
    T_OPENPAREN  shift, and go to state 2
    T_NUM        shift, and go to state 3
    T_ID         shift, and go to state 4
    T_QUOTE      shift, and go to state 5
    T_TRUE       shift, and go to state 6
    T_FALSE      shift, and go to state 7
    T_GT         shift, and go to state 8
    T_GTE        shift, and go to state 9
    T_LT         shift, and go to state 10
    T_LTE        shift, and go to state 11
    T_EQ         shift, and go to state 12
    T_OR         shift, and go to state 25
    T_AND        shift, and go to state 26
    T_NOT        shift, and go to state 13

    T_LEN        [reduce using rule 4 (Expression)]
    T_OPENPAREN  [reduce using rule 4 (Expression)]
    T_NUM        [reduce using rule 4 (Expression)]
    T_ID         [reduce using rule 4 (Expression)]
    T_QUOTE      [reduce using rule 4 (Expression)]
    T_TRUE       [reduce using rule 4 (Expression)]
    T_FALSE      [reduce using rule 4 (Expression)]
    T_GT         [reduce using rule 4 (Expression)]
    T_GTE        [reduce using rule 4 (Expression)]
    T_LT         [reduce using rule 4 (Expression)]
    T_LTE        [reduce using rule 4 (Expression)]
    T_EQ         [reduce using rule 4 (Expression)]
    T_OR         [reduce using rule 4 (Expression)]
    T_AND        [reduce using rule 4 (Expression)]
    T_NOT        [reduce using rule 4 (Expression)]
    $default     reduce using rule 4 (Expression)

    Expression  go to state 27

The best I can understand is that the issue may be caused by an implicit and ( rules 3 and 4 ). If this is the root of the issue, how do I go about fixing it?

Thanks!


Solution

  • While writing this question, I only just realized that the implicit and may be the root of the issue. Now with a more solid understanding of what might be wrong I was able to write the correct google query that lead me to this post.

    After reading the answer, the solution sort of just fell in place. I opted for an explicit precedence through the grammar rather than an implicit precedence through bison's precedence declarations. This gave me the following grammar:

    Start           : Expression                                    { astRoot = new MainNode( $1 ); }
                    ;
    
    Term            : T_TRUE                                        { $$ = new BoolTrueNode; }
                    | T_FALSE                                       { $$ = new BoolFalseNode; }
                    | T_ID                                          { $$ = new IdentifierNode( $1 ); }
                    | T_NUM                                         { $$ = new IntegerNode( $1 ); }
                    | T_QUOTE                                       { $$ = new QuoteNode( new IdentifierNode( $1 ) ); }
                    | T_LEN T_OPENPAREN T_NUM T_CLOSEPAREN          { $$ = new LenNode( new IntegerNode( $3 ) ); }
                    | T_OPENPAREN Expression T_CLOSEPAREN           { $$ = $2; }
                    ;
    
    Prefix          : Term
                    | T_NOT Prefix                                  { $$ = new NotNode( $2 ); }
                    | T_GT Term                                     { $$ = new GreaterNode( $2 ); }
                    | T_GTE Term                                    { $$ = new GreaterEqualNode( $2 ); }
                    | T_LT Term                                     { $$ = new LessNode( $2 ); }
                    | T_LTE Term                                    { $$ = new LessEqualNode( $2 ); }
                    | T_EQ Term                                     { $$ = new EqualNode( $2 ); }
                    ;
    
    Factor          : Prefix
                    | Prefix Factor                                 { $$ = new AndNode( $1, $2 ); }
                    ;
    
    Andor           : Factor
                    | Andor T_AND Factor                            { $$ = new AndNode( $1, $3 ); }
                    ;
    
    Expression      : Andor
                    | Expression T_OR Andor                         { $$ = new OrNode( $1, $3 ); }
                    ;
    

    And this gave me 0 shift/reduce conflicts and the values I'm expecting in all of my tests thus far. Also arranging my grammar this way made the more subtle nuances of the language apparent such as the validity of the following statements:

    T_NOT T_GT 100 => Not( Greater( "100" ) ) => valid
    T_GT T_NOT 100 => Greater( Not( "100" ) ) => invalid
    

    Unless anyone sees any hidden issues with the grammar I've presented, I believe this to have fully solved my problem.