PLY parser - equation assignation

I'm creating a calculator using PLY. I want to be able to assign equation to a dictionary as I am able to assign variables to another dictionary.

The way I'm assigning variables : x = 10 (in the dict xwill be the key and 10the value)

The way I want to be able to assign equation : fun(x) = x + 42 (in the dict funwill be the key and a tuple ('x', 'x+10') will be the value).

It's working with this writing fun|x| = x + 42 (note the 'pipe' symbol here). But it's not working with parentheses.

How can I make it work the right way ?

Here's my code so far :

import ply.yacc as yacc
import ply.lex as lex

################################## LEXER ################################

tokens = (
    'NAME',
    'NUMBER',
)

t_NAME      = r'[a-zA-Z]+'
literals    = '+=-*/|()'
t_ignore    = " \t"

def t_NUMBER(t):
    r'(?:\d+(?:\.\d*)?)'
    t.value = int(t.value)
    return t

def t_error(t):
    print('Illegal character \'{}\''.format(t.value[0]))
    t.lexer.skip(1)

################################## PARSER ################################

functions = {}
variables = {}

def p_operations(p):
    """ statement : expression
    expression : fifth
    fifth : fourth
    fourth : third
    third : second
    second : first
    first : NUMBER
    """
    p[0] = p[1]

def p_plus(p):
    """ fifth : fifth '+' fourth """
    if isinstance(p[1], str) or isinstance(p[3], str):
        p[0] = str(p[1]) + p[2] + str(p[3])
    else:
        p[0] = p[1] + p[3]

def p_minus(p):
    """ fifth : fifth '-' fourth """
    if isinstance(p[1], str) or isinstance(p[3], str):
        p[0] = str(p[1]) + p[2] + str(p[3])
    else:
        p[0] = p[1] - p[3]

def p_implicit_times(p):
    """ fourth : fourth second """
    if isinstance(p[1], str) or isinstance(p[2], str):
        p[0] = str(p[1]) + str(p[2])
    else:
        p[0] = p[1] * p[2]

def p_times(p):
    """ fourth : fourth '*' third """
    if isinstance(p[1], str) or isinstance(p[3], str):
        p[0] = str(p[1]) + p[2] + str(p[3])
    else:
        p[0] = p[1] * p[3]

def p_divide(p):
    """ fourth : fourth '/' third """
    if isinstance(p[1], str) or isinstance(p[3], str):
        p[0] = str(p[1]) + p[2] + str(p[3])
    else:
        p[0] = p[1] / p[3]

def p_unary_minus(p):
    """ third : '-' third """
    if isinstance(p[2], str):
        p[0] = '-' + p[2]
    else:
        p[0] = -p[2]

def p_power(p):
    """ second : first '^' third """
    if isinstance(p[1], str) or isinstance(p[3], str):
        p[0] = str(p[1]) + p[2] + str(p[3])
    else:
        p[0] = p[1] ** p[3]

def p_block(p):
    """ first : '(' expression ')' """
    p[0] = p[2]


################################ PROBLEM HERE ############################

def p_statement_assign(p):
    ''' statement : NAME '=' expression '''
    variables[p[1]] = p[3]
    p[0] = p[3]

def p_function_assign(p):
    ''' statement : NAME '|' expression '|' '=' expression '''
    functions[p[1]] = (p[3], p[6])
    p[0] = functions[p[1]]

def p_variable_expr(p):
    ''' first : NAME '''
    try : 
        p[0] = variables[p[1]]
    except:
        p[0] = p[1]

def p_error(t):
    print("Syntax error!")

################################## MAIN #################################

lexer = lex.lex() 
parser = yacc.yacc()

while True:
    try:
        question = raw_input('>>> ')
    except:
        question = input('>>> ')

    answer = parser.parse(question)
    if answer is not None:
        print(answer)

Solution

You allow implicit multiplication. That means that f(x) can be parsed as a product, in which case f would have to be reduced to fourth as per the implicit multiplication rule. But if it is to be parsed as an assignment, it needs to be left as a NAME. That's a shift-reduce conflict, which can easily be seen in parser.out:

state 3

    (16) statement -> NAME . = expression
    (17) statement -> NAME . ( expression ) = expression
    (18) first -> NAME .

  ! shift/reduce conflict for ( resolved as shift

What you see here is that when the parser sees NAME followed by (, it doesn't know whether to reduce NAME to first (and subsequently to second, etc. up to fourth), in the expectation that the statement is a simple computation, or to shift the (, thereby committing itself to treating this as a function definition.

You might already have experienced a similar problem, since a natural grammar for function definition is:

statement : NAME '(' NAME ')' '=' expression

but you've replaced the second NAME with expression. That will avoid the shift-reduce conflict before the ), at the cost of accepting questionable function definitions (f(a+3) = 2a).

It's possible to do something similar to avoid this shift-reduce conflict (but it's a very ad hoc solution):

statement : fourth '(' expression ')' '=' expression

That "works" in the sense that it accepts correct expressions. But it also silently (or somewhat silently) accepts quite a few other expressions:

This one is fine:

>>> f(a) = a + 3
('a', 'a+3')

But these are weird:

>>> -f(a) = a + 3
('a', 'a+3')
>>> 3f(a) = a + 3
('a', 'a+3')
>>> 3f(a+2) = a + 3
('a+2', 'a+3')

Alternatively, you could ignore the shift-reduce conflict, since PLY will, by default, do the shift (as it says in parser.out: "resolved as shift"). That will prevent the parser from accepting the weird examples above, but it will also incorrectly parse some expressions which might seem reasonable:

These ones seem appropriate:

>>> f(a) = a + 3
('a', 'a+3')
>>> -f(a) = a + 3
Syntax error!
a+3
>>> 3f(a) = a + 3
Syntax error!
a+3

But we might expect this to print 105:

>>> a=7
7
>>> a(2a+1)
Syntax error!

If that doesn't bother you, you can stop reading now.

Your grammar is not ambiguous, and nor would it be ambiguous if you'd written the definition syntax with greater specificity:

statement : NAME '(' NAME ')' '=' expression

or, if you wanted to allow functions with multiple arguments:

statement : NAME '(' name_list ')' '=' expression
name_list : NAME
name_list : name_list ',' NAME

But it is not LR(1), and it will be very difficult to make it LR(1). Both of the syntaxes suggested above are LR(4), and since every LR(k) grammar can theoretically be converted mechanically into a (very bloated and hardly readable) LR(1) grammar, a solution must exist. It won't be pretty, though.

(The actual syntax you are using, with expression, is not LR(k) for any k, because expression can be arbitrarily long and the parser has to see beyond the argument list in order to decide whether or not to reduce the first NAME.)

Since the grammar is unambiguous, you could parse it with a GLR/GLL/Earley/etc. parser, but PLY doesn't produce those and I don't know of a Python parser generator which does (although that doesn't mean that one doesn't exist). There are various GLR parser generators available for other languages.

With PLY, though, your best bet is probably to use the general syntax shown above as an ad hoc solution and then solve the problem of accepting incorrect definitions by doing a check in the semantic action.

However, that check will be a bit tricky unless you bite the bullet and change to a parser which produces ASTs instead of doing immediate evaluation, as we have discussed several times. If the semantic values are ASTs, then it will be straight-forward to verify that both p[1] and p[3] are simple names in the action function for

statement : fourth '(' expression ')' '=' expression

I suppose, human ingenuity being limitless, that you might find some other hack which lets you do the test. But don't expect much sympathy when it fails on a corner case.