Search code examples
pythonmathsimplify

how to simplify huge arithmatic expression?


I've a huge expression like:

x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52] + FUNC1(z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51] + FUNC0(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49)) + FUNC2(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49), a49, x49) + y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51] + FUNC1(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49) + w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC1(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + movr[53] + m1[53]) + RET(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + movr[53] + m1[53], x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + movr[54] + m1[54]) + RET(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49) + w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC1(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + movr[53] + m1[53]) + RET(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) ...

I need to simplify this expression by replacing some part of repeating expression with other variables. For example:

a = RET(v49, t49, z49) 
b= w49 + h49 + FUNC1(v49) + a + movr[50] + m1[50] 
and so on...

my problem is; this is really huge expression (like 2MB long expression) and doing this manually is near impossible and also without mistakes.

now my question is; is there any app that'll do such thing? or any python program to do so?

I can program python easily, but I lack of such algorithm knowing.

any help appreciated.


Solution

  • The following function extracts all function calls and puts them into variables.

    def simplify(progstr, variable_prefix='x'):
        progstr = f' {progstr} '
        prog = []
        while progstr.count('(') > 0:
            for i, c in enumerate(progstr):
                if c == ')':
                    c2, i2 = None, i
                    while c2 != '(':
                        i2 -= 1
                        c2 = progstr[i2]
                    i2 -= 1
                    c2 = progstr[i2]
                    while c2 not in [',', ' ', '(', ')']:
                        i2 -= 1
                        c2 = progstr[i2]
                    variable = progstr[i2+1:i+1]
                    vname = f'{variable_prefix}{str(len(prog))}'
                    progstr = progstr.replace(variable, vname)
                    prog.append(f'{vname} = {variable}')
                    break
        prog.append(progstr[1:-1])
        return '\n'.join(prog)
    
    expression = 'x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]'
    
    print(simplify(expression, 'x'))
    

    prints

    x0 = FUNC1(v49)
    x1 = RET(v49, t49, z49)
    x2 = FUNC1(w49 + h49 + x0 + x1 + movr[50] + m1[50])
    x3 = RET(w49 + h49 + x0 + x1 + movr[50] + m1[50], v49, t49)
    x4 = FUNC1(y49 + z49 + x2 + x3 + movr[51] + m1[51])
    x5 = RET(y49 + z49 + x2 + x3 + movr[51] + m1[51], w49 + h49 + x0 + x1 + movr[50] + m1[50], v49)
    t49 + x4 + x5 + movr[52] + m1[52]
    

    Next to making the code more readable, this allows avoiding much repeated computation, which should speed it up a lot (especially if the individual function calls are costly), e.g. here FUNC1(v49) gets executed only once as opposed to 5 times.

    (Edit): How it works:

    While there are parentheses in the expression, do the following: Go through the expression from left to right, until you encounter a closing bracket (call this location j), then walk to the left until you encounter an opening bracket, then walk to the left until you encounter a whitespace, comma or bracket (and call this location i). The segment expression[i:j] then marks the first function call. Then simply replace each occurrence expression[i:j] in expression by a variable name x and add x = expression[i:j] to your list of variables.

    Some remarks on the code:

    • it is only briefly tested on some valid and well-formatted code, e.g. it assumes a space after each comma, a space around each operator (+, *, ..., no space), no space before a closing bracket, no space after an opening bracket, ...),
    • it also assumes that the variable prefix does not collide with the variable and function names in the expression (can be made more robust either by knowing about the names and providing some known-to-be safe prefix or by adjusting the code to itself select a prefix that is valid),
    • it further assumes that f(*args) (for a function f and some constant arguments args) always returns the same result, no matter what was computed beforehand, and
    • it also creates variables for sub-expressions with only a single occurrence, which may not be desirable (but this should easily be changeable too, by some small adaptations of the code).