Search code examples
pythonregexpython-2.7abstract-syntax-treepyparsing

changing ** operator to power function using parsing?


My requirement is to change ** operator to power function

For example

1.Input -"B**2"
Output - power(B,2)
2."B**2&&T**2*X"
Output - power(B,2)

I have wrote following regular expression to address that problem

   rx=r"([a-zA-Z0-9]+)\*\*([a-zA-Z0-9()]+)"
        result = regex.sub(rx, r"power(\1,\2)", expression, 0, regex.IGNORECASE | regex.MULTILINE)

But above code successfully converting expression similar to the example 1 and example 2, but failed to convert expression like (a+1)**2 or ((a+b)*c)**2. I realized regular expression is not the best way to handle such scenarios. Instead of that parsing will be the best way to handle that. I bit new to python .Please guide me how to approach to solve this problem.


Solution

  • This sounds very familiar, I think I dealt with a similar problem on the pyparsing mailing list, but I can't find it at the moment. But try something like this:

    from pyparsing import *
    
    # define some basic operand expressions
    number = Regex(r'\d+(\.\d*)?([Ee][+-]?\d+)?')
    ident = Word(alphas+'_', alphanums+'_')
    
    # forward declare our overall expression, since a slice could 
    # contain an arithmetic expression
    expr = Forward()
    slice_ref = '[' + expr + ']'
    
    # define our arithmetic operand
    operand = number | Combine(ident + Optional(slice_ref))
    
    # parse actions to convert parsed items
    def convert_to_pow(tokens):
        tmp = tokens[0][:]
        ret = tmp.pop(-1)
        tmp.pop(-1)
        while tmp:
            base = tmp.pop(-1)
            # hack to handle '**' precedence ahead of '-'
            if base.startswith('-'):
                ret = '-pow(%s,%s)' % (base[1:], ret)
            else:
                ret = 'pow(%s,%s)' % (base, ret)
            if tmp:
                tmp.pop(-1)
        return ret
    
    def unary_as_is(tokens):
        return '(%s)' % ''.join(tokens[0])
    
    def as_is(tokens):
        return '%s' % ''.join(tokens[0])
    
    # simplest infixNotation - may need to add a few more operators, but start with this for now
    arith_expr = infixNotation( operand,
        [
        ('-', 1, opAssoc.RIGHT, as_is),
        ('**', 2, opAssoc.LEFT, convert_to_pow),
        ('-', 1, opAssoc.RIGHT, unary_as_is),
        (oneOf("* /"), 2, opAssoc.LEFT, as_is),
        (oneOf("+ -"), 2, opAssoc.LEFT, as_is),
        ])
    
    # now assign into forward-declared expr
    expr <<= arith_expr.setParseAction(lambda t: '(%s)' % ''.join(t))
    
    assert "2**3" == expr
    assert "2**-3" == expr
    
    # test it out
    tests = [
        "2**3",
        "2**-3",
        "2**3**x5",
        "2**-3**x6[-1]",
        "2**-3**x5+1",
        "(a+1)**2",
        "((a+b)*c)**2",
        "B**2",
        "-B**2",
        "(-B)**2",
        "B**-2",
        "B**(-2)",
        "B**2&&T**2*X",
        ]
    
    x5 = 2
    a,b,c = 1,2,3
    B = 4
    x6 = [3,2]
    for test in tests:
        print test
        xform = expr.transformString(test)[1:-1]
        print xform
        print '**' not in xform and eval(xform) == eval(test)
        print
    

    prints:

    2**3
    pow(2,3)
    True
    
    2**-3
    pow(2,-3)
    True
    
    2**3**x5
    pow(2,pow(3,x5))
    True
    
    2**-3**x6[-1]
    pow(2,-pow(3,x6[((-1))]))
    True
    
    2**-3**x5+1
    pow(2,-pow(3,x5))+1
    True
    
    (a+1)**2
    pow((a+1),2)
    True
    
    ((a+b)*c)**2
    pow(((a+b)*c),2)
    True
    
    B**2
    pow(B,2)
    True
    
    -B**2
    (-pow(B,2))
    True
    
    (-B)**2
    pow(((-B)),2)
    True
    
    B**-2
    pow(B,-2)
    True
    
    B**(-2)
    pow(B,((-2)))
    True
    
    B**2&&T**2*X
    pow(B,2))&&(pow(T,2)*X
    Traceback (most recent call last):
      File "convert_to_pow.py", line 85, in <module>
        print '**' not in xform and eval(xform) == eval(test)
      File "<string>", line 1
        pow(B,2))&&(pow(T,2)*X
                ^
    SyntaxError: invalid syntax
    

    If you have more corner cases in the code that you are converting, it will probably just need a bit more tweaking of the operand expression, or adding more operators (like &&) to the infixNotation expression.

    (Note that you have to convert a**b**c as if written a**(b**c), as chained exponentiation is evaluated right-to-left, not left-to-right.)

    EDIT:

    Introduced hack to properly handle precedence between '-' and '**'. Expanded tests to actually evaluate before/after strings. This looks more solid now.