Search code examples
pythonstringpython-3.xsplittokenize

How can I split a string of a mathematical expressions in python?


I made a program which convert infix to postfix in python. The problem is when I introduce the arguments. If i introduce something like this: (this will be a string)

( ( 73 + ( ( 34 - 72 ) / ( 33 - 3 ) ) ) + ( 56 + ( 95 - 28 ) ) )

it will split it with .split() and the program will work correctly. But I want the user to be able to introduce something like this:

((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )

As you can see I want that the blank spaces can be trivial but the program continue splitting the string by parentheses, integers (not digits) and operands.

I try to solve it with a for but I don't know how to catch the whole number (73 , 34 ,72) instead one digit by digit (7, 3 , 3 , 4 , 7 , 2)

To sum up, what I want is split a string like ((81 * 6) /42+ (3-1)) into:

[(, (, 81, *, 6, ), /, 42, +, (, 3, -, 1, ), )]

Solution

  • Tree with ast

    You could use ast to get a tree of the expression :

    import ast
    
    source = '((81 * 6) /42+ (3-1))'
    node = ast.parse(source) 
    
    def show_children(node, level=0):
        if isinstance(node, ast.Num):
            print(' ' * level + str(node.n))
        else:
            print(' ' * level + str(node))
        for child in ast.iter_child_nodes(node):
            show_children(child, level+1)
    
    show_children(node)
    

    It outputs :

    <_ast.Module object at 0x7f56abbc5490>
     <_ast.Expr object at 0x7f56abbc5350>
      <_ast.BinOp object at 0x7f56abbc5450>
       <_ast.BinOp object at 0x7f56abbc5390>
        <_ast.BinOp object at 0x7f56abb57cd0>
         81
         <_ast.Mult object at 0x7f56abbd0dd0>
         6
        <_ast.Div object at 0x7f56abbd0e50>
        42
       <_ast.Add object at 0x7f56abbd0cd0>
       <_ast.BinOp object at 0x7f56abb57dd0>
        3
        <_ast.Sub object at 0x7f56abbd0d50>
        1
    

    As @user2357112 wrote in the comments : ast.parse interprets Python syntax, not mathematical expressions. (1+2)(3+4) would be parsed as a function call and list comprehensions would be accepted even though they probably shouldn't be considered a valid mathematical expression.

    List with a regex

    If you want a flat structure, a regex could work :

    import re
    
    number_or_symbol = re.compile('(\d+|[^ 0-9])')
    print(re.findall(number_or_symbol, source))
    # ['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']
    

    It looks for either :

    • multiple digits
    • or any character which isn't a digit or a space

    Once you have a list of elements, you could check if the syntax is correct, for example with a stack to check if parentheses are matching, or if every element is a known one.