Search code examples
pythonparsingply

PLY parser - Unexpected behavior


I'm creating a calculator using PLY. I want to be able to compute numerical value, but also to store functions.

For any function given, let's say : fun(x) = x + 3 I store it in a dictionary where fun(x) is the key and x+3the value. Both key and value are stored as strings.

I can call the function like this fun(x) and it will return x+3. In this example I will replace the value 'x' by 9 (simulating the function call : fun(9)). It returns 12 as it should.

So, now my problem : when I try to add the return of this function to a NUMBER : fun(x) + 2 I will get a Syntax Error, although the 2 + fun(x) works perfectly fine ! I don't understand that at all.

This is my code simplified :

import ply.yacc as yacc
import ply.lex as lex

################################## LEXER ################################

tokens = (
    'FUNCTION',
    'VARIABLE',
    'NUMBER',
)

t_VARIABLE      = r'[a-zA-Z]+'
t_FUNCTION      = r'[a-zA-Z]{2,}\(([a-zA-Z]+)\)'
literals    = '+='
t_ignore    = " \t"

def t_NUMBER(t):
    r'(?:\d+(?:\.\d*)?)'
    t.value = int(t.value)
    return t

def t_error(t):
    print('Illegal character \'{}\''.format(t.value[0]))
    t.lexer.skip(1)

################################## PARSER ################################

functions = {}

def p_operations(p):
    """ expression : first
    first : NUMBER
    first : VARIABLE
    first : function
    """
    p[0] = p[1]

def p_plus(p):
    """ first : first '+' expression """
    if isinstance(p[1], str) or isinstance(p[3], str):
        p[0] = str(p[1]) + p[2] + str(p[3])
    else:
        p[0] = p[1] + p[3]

def p_function_assignation(p):
    '''expression : FUNCTION '=' expression'''
    functions[p[1]] = p[3]
    p[0] = p[3]


def p_function_expression(p):
    '''function : FUNCTION '''
    for key in functions.keys():
        if p[1] == key:
            p[0] = parser.parse(functions[key].replace('x', '9')). # I'm simulating here : fun(9)  
            break
    else:
        print("Variable '{}' not found".format(p[1]))

def p_error(t):
    print("Syntax error!")

################################## MAIN #################################

lexer = lex.lex() 
parser = yacc.yacc()

while True:
    try:
        question = raw_input('>>> ')
    except:
        question = input('>>> ')

    answer = parser.parse(question)
    if answer is not None:
        print(answer)

To test my code :

fun(x) = x + 3 => store the function into the dict

fun(x) => prints rightfully 12 (9 + 3)

2 + fun(x) => prints 14 as it should (2 + 12)

fun(x) + 2 => get an error !


Solution

  • Inside your p_function_expression you do a recursive call to the parse method:

    p[0] = parser.parse(functions[key].replace('x', '9'))
    

    (Note: as typed into your question, that line had a syntax error from an extraneous . following the final parenthesis.)

    That invocation of parse implicitly uses the global lexer; that is, the current value of lex.lexer (which is the last lexer created by lex.lex()). But Ply lexers (and, indeed, most lexers) are stateful; the lexer maintains the current input string and the current input position, along with some other information (such as the current lexer state if you're using multiple states).

    As a consequence, the recursive call to parse leaves the (global) lexer pointed at the end of a string. So when the outer parse tries to read the next token (actually the second next token since it already has the lookahead token), it gets an EOF from the lexer, which produces a syntax error.

    You can see this by enabling parser debugging:

    answer = parser.parse(question, debug=True)
    

    This issue is briefly described in the Ply manual where it is noted that you should clone the lexer for reentrant lexical analysis.

    Unfortunately, the Ply manual doesn't mention that the Ply parser object is also not reentrant. In the case of the parser, the reentrancy issues are somewhat less obvious; they really only apply during syntax error recovery (unless you store your own persistent data in the parser object, which you are allowed to do). The parser doesn't have a clone method, largely because the parse tables have already been precomputed and cached, so it's not as expensive to create a new parser as it is to create a new lexer.

    In short, what you needed was to do the inner parse with a fresh parser object which uses a fresh lexer object:

    p[0] = yacc.yacc().parse(functions[key].replace('x', '9'),
                             lexer=p.lexer.clone())
    

    (The parser object doesn't have a persistent attribute which stores the current lexer, but it is available in the argument passed to parser action functions as p.lexer. See the ever-helpful Ply manual.)

    Also, you really should look into the purpose of Python dictionaries. They are precisely designed so that you don't need to loop over all entries in order to find the key you want. You can lookup a key in a simple O(1) operation. A much simpler version (also considerably faster if you have several functions) is:

    def p_function_expression(p):
        '''function : FUNCTION '''
        if p[1] in functions:
            p[0] = yacc.yacc().parse(functions[p[1]].replace('x', '9'),
                                     lexer=p.lexer.clone())
        else:
            print("Variable '{}' not found".format(p[1]))
    

    That looks up the function name twice, which is still just two linear operations. But if you want to do the lookup only once, you could use the dictionary's get method, which returns a default value:

    def p_function_expression(p):
        '''function : FUNCTION '''
        funcbody = functions.get(p[1], None)
        if funcbody is not None:
            p[0] = yacc.yacc().parse(funcbody.replace('x', '9'),
                                     lexer=p.lexer.clone())
        else:
            print("Variable '{}' not found".format(p[1]))
    

    You could also do this by looking the name up inside a try block which catches KeyError.

    I think it should go without saying that this is not a good way to accomplish the task you have set yourself. It would be much better to represent function bodies as a preparsed AST, which could later be evaluated with the actual parameters. In this model, you don't actually require immediate evaluation at all; everything is parsed into an AST, and when (if) you want to evaluate the parsed text, you call the AST's eval method to do so.