Search code examples
pythonyaccbnfply

Trouble parsing elif using PLY


I'm continuing my journey into parsing using PLY. I've run into a problem when parsing IF and ELSE IF statements.

Here is my code, I've stripped out the irrelevant bits and just left it with the basics (this will run on Python 2.7).

from ply import lex, yacc

tokens = [
    'ELSE',
    'IF',
    'LPAREN',
    'RPAREN',
    'LCURLY',
    'RCURLY'
    ]


t_IF = r'IF'
t_ELSE = r'ELSE'
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_LCURLY = r'\{'
t_RCURLY = r'\}'
t_ignore = '\r\t '


def t_error(t):
    print t


def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)


def p_error(p):
    print ('error', p)


def p_empty(p):
    'empty :'
    p[0] = None


def p_if_statement(p):
    'if_statement : IF LPAREN RPAREN LCURLY RCURLY elif_statements_or_empty else_statement_or_empty'
    p[0] = ('if_statement', (p[6]))
    print p[0]


def p_else_statement(p):
    'else_statement : ELSE LCURLY RCURLY'
    p[0] = ('else_statement')


def p_else_statement_or_empty(p):
    '''else_statement_or_empty : else_statement
                               | empty'''
    p[0] = p[1]


def p_elif_statement(p):
    'elif_statement : ELSE IF LPAREN RPAREN LCURLY RCURLY'
    p[0] = ('elif_statement')


def p_elif_statements_1(p):
    'elif_statements : elif_statement'
    p[0] = [p[1]]


def p_elif_statements_2(p):
    'elif_statements : elif_statements elif_statement'
    p[0] = p[1] + [p[2]]


def p_elif_statements_or_empty(p):
    '''elif_statements_or_empty : elif_statements
                                | empty'''
    p[0] = p[1]


lexer = lex.lex()

s = 'IF(){}ELSE IF(){}ELSE{}'

parser = yacc.yacc(start='if_statement')
parser.parse(s)  # ('error', LexToken(LCURLY,'{',1,21))

Essentially, the problem is that I'm searching for an infinite amount of elif_statements. When it encounters an ELSE {, it thinks that there is a parsing error because it expected an IF after the ELSE.

Is this what's called a "shift/reduce error"? What are some ways that I can resolve this?

Any help is greatly appreciated!


Solution

  • Yes, that is the result of a shift/reduce conflict. PLY warns you that you have two of these, and they are both somewhat related to the else clauses.

    However, the problem has nothing to do with the potentially infinite sequence of else if clauses. That is not a problem. The problem is more subtle and has to do with your use of empty productions.

    The most important aspect of LR(1), from the point of view of figuring out issues, is that every non-terminal needs to be reduced (that is, created from its right-hand side) before the immediately following token is shifted. With that in mind, let's take a look at the grammar for if_statement:

    if_statement : IF LPAREN RPAREN LCURLY RCURLY 
                   elif_statements_or_empty
                   else_statement_or_empty
    

    and let's suppose that we have seen IF(){}, so the next token is ELSE. At this point in the right-hand side (which is the only active item in the parse at this point) we know that the next token must be either ELSE or the end of the input. In the latter case, it's clear what must be done: an empty must be reduced and then an elif_statements_or_empty must reduced from that empty; next another empty must be reduced and an else_statement_or_empty must be reduced from that empty. That all works.

    But suppose the next token is ELSE. Now what? Since we cannot see any further than the ELSE, we don't know whether it is followed by IF or (. But we have to decide between two options:

    1. Reduce an empty and then an elif_statements_or_empty. That will be correct if the ELSE is the beginning of else_statement_or_empty (which in this case cannot be empty).

    2. Shift the ELSE, which can then only be the start of elif_statements_or_empty, which cannot be empty and thus must be elif_statements. That will be correct if the ELSE is part of an ELSE IF clause.

    So that's one shift/reduce conflict. Bison always chooses the shift in a shift/reduce conflict, so it will assume that the first ELSE following the IF is the start of an ELSE IF clause. Let's continue with that assumption (which happens to be correct in the case of the provided input), and continue until we reach the following ELSE. We've now seen IF(){}ELSE IF(){} and we're again staring at an ELSE.

    At this point, it is absolutely clear that ELSE IF(){} must be reduced to else_if_statement and it is also clear that else_if_statement must be reduced to else_if_statements. So far, so good. But now comes the problem. If there will be another ELSE IF, that's enough reducing and we should shift the ELSE to be part of the following else_if_statement. But if the ELSE is the start of the final else_statement, then we need to reduce the else_if_statements we just created into an else_if_statements_or_empty, since that reduction will be needed before we can shift the ELSE which starts else_statement_or_empty.

    Again, bison always goes for the shift. But that means that it cannot ever decide to start an else_statement_or_empty.


    So, what to do?

    The first thing to note is that the elif_statements_or_empty production really is not adding any value at all. (Indeed, neither is the empty production, which I was going to just eliminate silently, but I'll leave this parenthetic obituary.) If we just put elif_statements in the if_statement, then we would not need to decide between elif_statements and elif_statements_or_empty. Of course, we would need elif_statements to match an empty string, but that's trivial: we just provide an empty right-hand side for the recursive base case. That is, instead of

    elif_statements: elif_statement | elif_statements elif_statement
    

    we will use

    elif_statements: | elif_statements elif_statement
    

    Instead of matching any number but at least one elif_statement, we will now match any number including zero elif_statements. Which is exactly what we want.

    Now that the elif_statements base case is empty, we no longer need to decide after the IF(){} whether or not we will have an empty set of elif_statements. We can just reduce an empty elif_statements and then see whether what comes next is an elif_statement or an else_statement. So with that simplification, we have actually eliminated both shift-reduce conflicts.

    In summary, the new PLY file: (parser rules only, nothing else has changed):

    def p_error(p):
        print ('error', p)
    
    # Fixed to return both elif and else statements    
    def p_if_statement(p):
        'if_statement : IF LPAREN RPAREN LCURLY RCURLY elif_statements else_statement'
        p[0] = ('if_statement', (p[6] + p[7]))
        print p[0]
    
    # This needs to return a list because it could be empty
    def p_else_statement(p):
        'else_statement : ELSE LCURLY RCURLY'
        p[0] = ['else_statement']
    
    def p_elif_statement(p):
        'elif_statement : ELSE IF LPAREN RPAREN LCURLY RCURLY'
        p[0] = 'elif_statement'
    
    def p_elif_statements(p):
        'elif_statements : elif_statements elif_statement'
        p[0] = p[1] + [p[2]]
    
    # This handles both empty elif_statements and an empty else
    # by reducing to an empty list
    def p_empty(p):
        '''elif_statements : 
           else_statement  :'''
        p[0] = []