I'm continuing my journey into parsing using PLY. I've run into a problem when parsing IF
and ELSE IF
statements.
Here is my code, I've stripped out the irrelevant bits and just left it with the basics (this will run on Python 2.7).
from ply import lex, yacc
tokens = [
'ELSE',
'IF',
'LPAREN',
'RPAREN',
'LCURLY',
'RCURLY'
]
t_IF = r'IF'
t_ELSE = r'ELSE'
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_LCURLY = r'\{'
t_RCURLY = r'\}'
t_ignore = '\r\t '
def t_error(t):
print t
def t_newline(t):
r'\n+'
t.lexer.lineno += len(t.value)
def p_error(p):
print ('error', p)
def p_empty(p):
'empty :'
p[0] = None
def p_if_statement(p):
'if_statement : IF LPAREN RPAREN LCURLY RCURLY elif_statements_or_empty else_statement_or_empty'
p[0] = ('if_statement', (p[6]))
print p[0]
def p_else_statement(p):
'else_statement : ELSE LCURLY RCURLY'
p[0] = ('else_statement')
def p_else_statement_or_empty(p):
'''else_statement_or_empty : else_statement
| empty'''
p[0] = p[1]
def p_elif_statement(p):
'elif_statement : ELSE IF LPAREN RPAREN LCURLY RCURLY'
p[0] = ('elif_statement')
def p_elif_statements_1(p):
'elif_statements : elif_statement'
p[0] = [p[1]]
def p_elif_statements_2(p):
'elif_statements : elif_statements elif_statement'
p[0] = p[1] + [p[2]]
def p_elif_statements_or_empty(p):
'''elif_statements_or_empty : elif_statements
| empty'''
p[0] = p[1]
lexer = lex.lex()
s = 'IF(){}ELSE IF(){}ELSE{}'
parser = yacc.yacc(start='if_statement')
parser.parse(s) # ('error', LexToken(LCURLY,'{',1,21))
Essentially, the problem is that I'm searching for an infinite amount of elif_statement
s. When it encounters an ELSE {
, it thinks that there is a parsing error because it expected an IF
after the ELSE
.
Is this what's called a "shift/reduce error"? What are some ways that I can resolve this?
Any help is greatly appreciated!
Yes, that is the result of a shift/reduce conflict. PLY warns you that you have two of these, and they are both somewhat related to the else
clauses.
However, the problem has nothing to do with the potentially infinite sequence of else if
clauses. That is not a problem. The problem is more subtle and has to do with your use of empty productions.
The most important aspect of LR(1), from the point of view of figuring out issues, is that every non-terminal needs to be reduced (that is, created from its right-hand side) before the immediately following token is shifted. With that in mind, let's take a look at the grammar for if_statement
:
if_statement : IF LPAREN RPAREN LCURLY RCURLY
elif_statements_or_empty
else_statement_or_empty
and let's suppose that we have seen IF(){}
, so the next token is ELSE
. At this point in the right-hand side (which is the only active item in the parse at this point) we know that the next token must be either ELSE
or the end of the input. In the latter case, it's clear what must be done: an empty
must be reduced and then an elif_statements_or_empty
must reduced from that empty
; next another empty
must be reduced and an else_statement_or_empty
must be reduced from that empty
. That all works.
But suppose the next token is ELSE
. Now what? Since we cannot see any further than the ELSE
, we don't know whether it is followed by IF
or (
. But we have to decide between two options:
Reduce an empty
and then an elif_statements_or_empty
. That will be correct if the ELSE
is the beginning of else_statement_or_empty
(which in this case cannot be empty).
Shift the ELSE
, which can then only be the start of elif_statements_or_empty
, which cannot be empty and thus must be elif_statements
. That will be correct if the ELSE
is part of an ELSE IF
clause.
So that's one shift/reduce conflict. Bison always chooses the shift in a shift/reduce conflict, so it will assume that the first ELSE
following the IF
is the start of an ELSE IF
clause. Let's continue with that assumption (which happens to be correct in the case of the provided input), and continue until we reach the following ELSE
. We've now seen IF(){}ELSE IF(){}
and we're again staring at an ELSE
.
At this point, it is absolutely clear that ELSE IF(){}
must be reduced to else_if_statement
and it is also clear that else_if_statement
must be reduced to else_if_statements
. So far, so good. But now comes the problem. If there will be another ELSE IF
, that's enough reducing and we should shift the ELSE
to be part of the following else_if_statement
. But if the ELSE
is the start of the final else_statement
, then we need to reduce the else_if_statements
we just created into an else_if_statements_or_empty
, since that reduction will be needed before we can shift the ELSE
which starts else_statement_or_empty
.
Again, bison always goes for the shift. But that means that it cannot ever decide to start an else_statement_or_empty
.
So, what to do?
The first thing to note is that the elif_statements_or_empty
production really is not adding any value at all. (Indeed, neither is the empty
production, which I was going to just eliminate silently, but I'll leave this parenthetic obituary.) If we just put elif_statements
in the if_statement
, then we would not need to decide between elif_statements
and elif_statements_or_empty
. Of course, we would need elif_statements
to match an empty string, but that's trivial: we just provide an empty right-hand side for the recursive base case. That is, instead of
elif_statements: elif_statement | elif_statements elif_statement
we will use
elif_statements: | elif_statements elif_statement
Instead of matching any number but at least one elif_statement
, we will now match any number including zero elif_statement
s. Which is exactly what we want.
Now that the elif_statements
base case is empty, we no longer need to decide after the IF(){}
whether or not we will have an empty set of elif_statements
. We can just reduce an empty elif_statements
and then see whether what comes next is an elif_statement
or an else_statement
. So with that simplification, we have actually eliminated both shift-reduce conflicts.
In summary, the new PLY file: (parser rules only, nothing else has changed):
def p_error(p):
print ('error', p)
# Fixed to return both elif and else statements
def p_if_statement(p):
'if_statement : IF LPAREN RPAREN LCURLY RCURLY elif_statements else_statement'
p[0] = ('if_statement', (p[6] + p[7]))
print p[0]
# This needs to return a list because it could be empty
def p_else_statement(p):
'else_statement : ELSE LCURLY RCURLY'
p[0] = ['else_statement']
def p_elif_statement(p):
'elif_statement : ELSE IF LPAREN RPAREN LCURLY RCURLY'
p[0] = 'elif_statement'
def p_elif_statements(p):
'elif_statements : elif_statements elif_statement'
p[0] = p[1] + [p[2]]
# This handles both empty elif_statements and an empty else
# by reducing to an empty list
def p_empty(p):
'''elif_statements :
else_statement :'''
p[0] = []