I am trying to make a parser calculator in Python using PLY. I began with some PLY example code and worked my way from there. What I am trying to add is functionality for continuing calculations on the previous result. So if you type '4 + 5' the result is 9. If you then type '* 2 - 3', the new result should be 15, but with my code it is -9 because it parses '2 - 3' first, when it should parse '9 * 2' first. This issue occurs when I use multiplication or division as the first operator when doing a calculation with the previous result.
As shown in my code excerpt I tried giving precedence to the expressions that used the previous result, but I still have the same problem.
'r' is a variable which stores the previous result.
tokens = (
'NUMBER',
)
literals = ['=', '+', '-', '*', '/', '(', ')']
precedence = (
('left', '+', '-'),
('right', 'RADD', 'RSUB'),
('left', '*', '/'),
('right', 'RMUL', 'RDIV'),
('right', 'UMINUS'),
)
def p_statement_expr(p):
'statement : expression'
p[0] = p[1]
def p_expression_binop(p):
'''expression : expression '+' expression
| expression '-' expression
| expression '*' expression
| expression '/' expression'''
if p[2] == '+':
p[0] = p[1] + p[3]
elif p[2] == '-':
p[0] = p[1] - p[3]
elif p[2] == '*':
p[0] = p[1] * p[3]
elif p[2] == '/':
p[0] = p[1] / p[3]
def p_expression_cont(p):
'''statement : '+' expression %prec RADD
| '-' expression %prec RSUB
| '*' expression %prec RMUL
| '/' expression %prec RDIV '''
if p[1] == '+':
p[0] = r + p[2]
elif p[1] == '-':
p[0] = r - p[2]
elif p[1] == '*':
p[0] = r * p[2]
elif p[1] == '/':
p[0] = r / p[2]
def p_expression_uminus(p):
"expression : '(' '-' expression ')' %prec UMINUS"
def p_expression_group(p):
"expression : '(' expression ')'"
p[0] = p[2]
def p_expression_number(p):
"expression : NUMBER"
p[0] = p[1]
I also attempted to change the grammar to
p_expression_cont(p):
'''expression : '+' expression %prec MORADD
| '-' expression %prec MORSUB
| '*' expression %prec MORMUL
| '/' expression %prec MORDIV '''
which solved my initial issue, but now something like '++-*++/23' is parsable, which I obviously do not want. How do I modify my grammar so that I get the correct calculations when using the previous result?
This is easily doable with PLY (or any other yacc-like parser generator) but it's important to have some understanding of the nature of the thing you are trying to parse.
Intuitively, a continuation line like * 2 - 3
should be parsed as though it were written as <prev> * 2 - 3
where <prev>
is a non-terminal which somehow represents the previous value.
For a simple case, we could just define
expression : '$'
so that the previous value is represented by the explicit token $
. Then the rest of the grammar would just work.
Of course, the intention is that the user doesn't have to type $
(although, really, they might find it useful. See below.) So we instead need
expression: prev
where the non-terminal prev
must represent the empty token sequence. There's no problem with a non-terminal being represented by zero input tokens, but obviously we need to restrict the circumstances under which this can happen. So we can't just add:
prev :
to the grammar, because that will lead to the problem you already encountered: it makes 2**4
valid, as though it had been 2*<prev>*4
.
Clearly, we want the grammar to only manufacture an empty <prev>
right at the beginning of a statement. It only remains to figure out how to do this.
Let's suppose that we have two kinds of expression
: one kind allows an empty left-hand side, and the other one doesn't. How would we define these two non-terminals?
The second one is just the usual definition:
expression : expression '+' expression
expression : expression '-' expression
expression : expression '*' expression
expression : expression '/' expression
expression : '-' atom # See note below
expression : atom
atom : NUMBER
atom : '(' expression ')'
(I separated out atom
productions for reasons which will be seen below).
Now, what about the expressions whose left-hand operand could be implicit? These are pretty similar:
expr_cont : expr_cont '+' expression
expr_cont : expr_cont '-' expression
expr_cont : expr_cont '*' expression
expr_cont : expr_cont '/' expression
expr_cont : atom
But there is one extra case:
expr_cont :
In the first four productions, we're saying that in a expr_cont
with an operator, the right-hand operand must be an ordinary expression
, but the left-hand operand can be an expression with an implicit left-hand operand. In the last production, we allow the implicit left-hand operand (but only for this kind of expression). Explicit left-hand operands -- numbers and parenthesized expressions -- are the same, so we can reuse the atom
non-terminal created above.
Note about unary minus: In the original grammar, unary minus was only allowed within parentheses, and its precedence was incorrect. The original production was:
expression: '(' '-' expression ')' %prec UMINUS
In this production, the precedence declaration is completely pointless, because the production itself is never ambiguous. (It is unambiguously surrounded by parentheses, so it must be reduced when the closing parenthesis is encountered.) The application of the unary minus inside the parentheses is also unambiguous, but incorrect: the production effectively specifies that the unary minus applies to the entire expression enclosed in parentheses, so that (-4 - 2)
will evaluate to -2 (-(4-2)) rather than the expected -6.
The simple fix applied in the above grammar explicitly indicates that the operand to unary minus is an atom
, not an expression
. Consequently, -4 - 2
must be parsed as (-4) - 2
, because 4 - 2
cannot be an atom
. However, that fix removes the requirement that expressions starting with a unary minus be parenthesized.
The fact that the unary minus production is only in expression
and not expr_cont
means that continuation expressions starting with a minus sign will be parsed as such, which was presumably the intention of the restriction.
Now the implementation is straight-forward:
precedence = (
('left', '+', '-'),
('left', '*', '/'),
)
# We can use the same function for all unit productions which pass
# through the semantic value of the right-hand symbol.
# (A unit production is one whose right-hand side has a single symbol.)
def p_unit(p):
'''statement : expr_cont
expression : atom
expr_cont : atom
atom : NUMBER
'''
p[0] = p[1]
# Personally, I would write this as four functions, one for each operator,
# rather than executing the chain of if statements every time.
def p_expression_binop(p):
'''expression : expression '+' expression
| expression '-' expression
| expression '*' expression
| expression '/' expression
expr_cont : expr_cont '+' expression
| expr_cont '-' expression
| expr_cont '*' expression
| expr_cont '/' expression
'''
if p[2] == '+':
p[0] = p[1] + p[3]
elif p[2] == '-':
p[0] = p[1] - p[3]
elif p[2] == '*':
p[0] = p[1] * p[3]
elif p[2] == '/':
p[0] = p[1] / p[3]
def p_expression_uminus(p):
'''expresion : '-' atom
'''
p[0] = -p[2]
def p_atom_group(p):
'''atom : '(' expression ')'
'''
p[0] = p[2]
def p_expr_previous(p):
'''expr_cont :
'''
p[0] = r # "previous" would be a better variable name than r
As mentioned above, there might be occasions where the user would find it handy to have an explicit way of writing "the previous value". Suppose, for example that they wanted to see the result of <prev>²+1
. Or suppose they just wanted to override the default arithmetic precedence to compute (<prev> - 3) * 2
. Both of these could easily be provided by allowing the use of, say, $
as an explicit "previous value" token. After adding it to the literals list, it would then only be necessary to modify the last parser function:
def p_expr_previous(p):
'''expr_cont :
| '$'
expression : '$'
'''
p[0] = r # "previous" would still be a better variable name than r