Search code examples
pythonpyparsing

Add function parsing to simple pyparsing with non-numeric arguments


I am trying to add functions to expressions, with ability to accept non-numeric argument(s). Following Add function parsing to simple pyparsing arithmetics grammar and https://pyparsing.wikispaces.com/file/view/fourFn.py, I could managed functions with numeric inputs. However couldn't manage to upgrade them for non-numeric inputs. Here is my test code, attempting to pass ident also along with expr to function input:

# fourFn.py
#
# Demonstration of the pyparsing module, implementing a simple 4-function expression parser,
# with support for scientific notation, and symbols for e and pi.
# Extended to add exponentiation and simple built-in functions.
# Extended test cases, simplified pushFirst method.
#
# Copyright 2003-2006 by Paul McGuire
#
from pyparsing import Literal,CaselessLiteral,Word,Combine,Group,Optional,\
    ZeroOrMore,Forward,nums,alphas,delimitedList
import math
import operator
import pprint 

exprStack = []

def pushFirst( strg, loc, toks ):
    exprStack.append( toks[0] )
def pushUMinus( strg, loc, toks ):
    if toks and toks[0]=='-': 
        exprStack.append( 'unary -' )
        #~ exprStack.append( '-1' )
        #~ exprStack.append( '*' )

bnf = None
def BNF():
    """
    expop   :: '^'
    multop  :: '*' | '/'
    addop   :: '+' | '-'
    integer :: ['+' | '-'] '0'..'9'+
    atom    :: PI | E | real | fn '(' expr ')' | '(' expr ')'
    factor  :: atom [ expop factor ]*
    term    :: factor [ multop factor ]*
    expr    :: term [ addop term ]*
    """
    global bnf
    if not bnf:
        point = Literal( "." )
        #e     = CaselessLiteral( "E" )
        fnumber = Combine( Word( "+-"+nums, nums ) + 
                           Optional( point + Optional( Word( nums ) ) ) 
                            )
        ident = Word(alphas, alphas+nums)

        plus  = Literal( "+" )
        minus = Literal( "-" )
        mult  = Literal( "*" )
        div   = Literal( "/" )
        lpar  = Literal( "(" ).suppress()
        rpar  = Literal( ")" ).suppress()
        addop  = plus | minus
        multop = mult | div
        expop = Literal( "^" )
        #pi    = CaselessLiteral( "PI" )
        expr = Forward()
        #function_calla = Group(ident + lpar + ident + rpar)
        function_call = Group(ident + lpar + Group(Optional(delimitedList(ident|expr))) + rpar)
        atom = (Optional("-") + ( fnumber | ident + lpar + expr + rpar |function_call).setParseAction( pushFirst ) | ( lpar + expr.suppress() + rpar )).setParseAction(pushUMinus) 
        # by defining exponentiation as "atom [ ^ factor ]..." instead of "atom [ ^ atom ]...", we get right-to-left exponents, instead of left-to-righ
        # that is, 2^3^2 = 2^(3^2), not (2^3)^2.
        factor = Forward()
        factor << atom + ZeroOrMore( ( expop + factor ).setParseAction( pushFirst ) )

        term = factor + ZeroOrMore( ( multop + factor ).setParseAction( pushFirst ) )
        expr << term + ZeroOrMore( ( addop + term ).setParseAction( pushFirst ) ) 
        bnf = expr
    return bnf
def asw(x):
    print "X is ",x
    return 1
# map operator symbols to corresponding arithmetic operations
epsilon = 1e-12
opn = { "+" : operator.add,
        "-" : operator.sub,
        "*" : operator.mul,
        "/" : operator.truediv,
        "^" : operator.pow }
fn  = { "sin" : math.sin,
        "cos" : math.cos,
        "tan" : math.tan,
        "abs" : abs,
        "trunc" : lambda a: int(a),
        "round" : round,
        "asw" : asw,
        "sgn" : lambda a: abs(a)>epsilon and cmp(a,0) or 0}
def evaluateStack( s ):
    print "s=",s
    op = s.pop()
    print "op=",op,type(op)
    if op == 'unary -':
        return -evaluateStack( s )
    #if type(op)!=str:
    #    return evaluateStack( s )

    if op in "+-*/^":
        op2 = evaluateStack( s )
        op1 = evaluateStack( s )
        return opn[op]( op1, op2 )
    elif op == "PI":
        return math.pi # 3.1415926535
    elif op == "E":
        return math.e  # 2.718281828
    elif op in fn:
        return fn[op]( evaluateStack( s ) )
    elif op[0].isalpha():
        return 0
    else:
        return float( op )
if __name__ == "__main__":      
    exprStack = []  
    res= BNF().parseString( "asw(aa)").asList()
    print "res=",res
    val = evaluateStack( exprStack[:] )
    print val

The result for numeric input is :

C:\temp>python test.py
res= ['asw', '11']
s= ['11', 'asw']
op= asw <type 'str'>
s= ['11']
op= 11 <type 'str'>
X is  11.0
1

Where a non-numeric input will result in:

C:\temp>python test.py
res= [['asw', ['aa']]]
s= [(['asw', (['aa'], {})], {})]
op= ['asw', ['aa']] <class 'pyparsing.ParseResults'>
Traceback (most recent call last):
  File "test.py", line 117, in <module>
    val = evaluateStack( exprStack[:] )
  File "test.py", line 99, in evaluateStack
    if op in "+-*/^":
TypeError: 'in <string>' requires string as left operand, not ParseResults

Where am I going wrong? Very new to pyparsing and still trying to figure it out.


Solution

  • I'm pretty sure your culprit is in your adding of function_call into atom at the end, and in making function_call a Group.

    function_call = (ident + lpar + Group(Optional(delimitedList(expr))) + rpar)
    atom = (Optional("-") + ( fnumber | function_call | ident).setParseAction( pushFirst ) | ( lpar + expr.suppress() + rpar )).setParseAction(pushUMinus) 
    

    Remember that '|' generates MatchFirst expressions, which will match on the first given expression that matches. If you define atom to match on ident first, then you will never match a function_call, since your function starts with an identifier. You must first check if the identifier you are parsing is the start of a function call before determining that it is actually a lone identifier.

    Also, function_call must not be a Group. The way this parser works is to parse and push all the arguments onto the stack, and then finally push the function name onto the stack. But this only works if function_call is not a Group - otherwise you are pushing the entire parsed function ParseResults onto the stack.

    Finally, note that I reduced function_call's arguments to just be exprs, not ident | expr. A lone ident is an expr, so the alternation is not necessary - and in fact it is messing things up again, for the same reason that listing ident ahead of function_call in atom is a problem.