Search code examples
pythonregexcontext-free-grammarpyparsing

Pyparsing - Finding Nested Polynomials


I'm searching through some algebra and trying to match all expressions of the form:

(subexp)^C

Where C is an integer and subexp can be one of two things:

a) it can be another expression of the form (subexp)^C b) It can be an expression of the form var1 op var2 op var3 ... op varn

where a var is of the form letters numbers, such as l2, cd3, hello53, etc. and an op is either -, *, or +. This second option can also have terms grouped by parens.

(There is no whitespace, I've just added whitespace above some places for clarity)

So, as an example:

(a12 + c33 + d34)^2
(a12 * c33 +-d34)^2
((a12 * c33)^5 + c3)^2

etc.

Expressions of this form will be embedded in a line of text. I need to find all instances of (subexp)^C, and replace them with pow(subexp, C). In short, I'm trying to convert some computer algebra to functioning C code.

I was originally doing this with regexp, but realized it wasn't a regular expression. For non-nested cases, the regexp is:

line = re.sub(r'\(([a-zA-Z_]+[0-9]+\+?-?\*?)+\)\^[0-9]+', replace_with_pow, line)

Here line is the line with the embedded polynomials, and replace_with_pow is the function that does replacement.

Unfortunately, when expressions can become nested, it's a CFG.

I have looked into pyparsing but have found the examples to be, well, difficult to parse, and the documentation lacking. That seems to be the recommended library though.

Can anyone provide an example script of how to find nested expressions and replace them? (It can be for a simplified problem if you'd like I can build off it)

EDIT: Update: with pyparsing, I can now parse all nested expressions that have the form { stuff ( ...)^C stuff }, using the following code:

closing = pyparsing.Word( ")^" + pyparsing.nums )
thecontent = pyparsing.Word(pyparsing.alphanums) | '+' | '-' | '*' | ',' 
parens     = pyparsing.nestedExpr( '(', closing, content=thecontent)

thecontent2 = thecontent | parens
parens2 = pyparsing.nestedExpr('{', '}', content=thecontent2)

res = parens2.parseString("{sdf(a + (a + b)^5 + c)asdf}")

This brings me to two questions:

a) when I have matched my ^5, the parser consumes it. How can I extract the ^5? b) Is there a quick / easy way to do replacement with pyparsing?


Solution

  • nestedExpr is not really the way to go here, that method in pyparsing is there when the items inside the nested punctuation are not too well-defined. In your case, you are better of defining your own nesting using pyparsing Forward()s. See below:

    from pyparsing import *
    
    # define arithmetic items
    identifier = Word(alphas, alphanums+'_')
    integer = Word(nums)
    real = Regex(r'\d+\.\d*')
    oper = oneOf("* + - /")
    
    # define arithOperand as a Forward, since it could contain nested power expression
    arithOperand = Forward()
    arithExpr = arithOperand + ZeroOrMore(oper + arithOperand)
    groupedArithExpr = '(' + arithExpr + ')'
    
    # define expression for x^y, where x could be any kind of arithmetic term
    powerOp = Literal('^')
    powerExpr = (groupedArithExpr|real|integer|identifier) + powerOp + integer
    powerExpr.setParseAction(lambda tokens: 'pow(%s,%s)' % (tokens[0], tokens[2]))
    
    # now define the possible expressions for arithOperand, including a powerExpr
    arithOperand <<= powerExpr | real | integer | identifier | groupedArithExpr
    
    # convert parsed list of strings to a single string
    groupedArithExpr.setParseAction(''.join)
    
    # show how transform string will apply parse actions as transforms
    print arithOperand.transformString("x = (4*(1 + 3^2) * a)^10")
    print
    

    prints x = pow((4*(1+pow(3,2))*a),10)

    arithExpr.runTests("""\
        (a12 + c33 + d34)^2
        (a12 * c33 +-d34)^2
        (a12 * (c33 + c3))^2
        (a12 * (c33 + c3)^4)^2
        ((a12 * c33 + 12)^5 + c3)^2""")
    

    prints

    (a12 + c33 + d34)^2
    ['pow((a12+c33+d34),2)']
    
    (a12 * c33 +-d34)^2
           ^
    Expected ")" (at char 11), (line:1, col:12)
    
    (a12 * (c33 + c3))^2
    ['pow((a12*(c33+c3)),2)']
    
    (a12 * (c33 + c3)^4)^2
    ['pow((a12*pow((c33+c3),4)),2)']
    
    ((a12 * c33 + 12)^5 + c3)^2
    ['pow((pow((a12*c33+12),5)+c3),2)']
    

    Note the use of transformString above - this will search your source code for matches and splice the modified code back in where the match was found.