Recently I have to work with algorithms with a lot of long symbolic expressions such as this one
upperside = ( e * e * n * p * tn * tn +
2 * e * e * n * p * tn * tp +
e * e * n * p * tp * tp +
2 * e * n * n * p * te * tn +
2 * e * n * n * p * te * tp +
N * e * n * n * tp * tp +
2 * e * n * p * p * te * tn +
2 * e * n * p * p * te * tp -
2 * N * e * n * p * tn * tp +
N * e * p * p * tn * tn +
n * n * n * p * te * te +
2 * n * n * p * p * te * te +
n * p * p * p * te * te)
remformated
upperside = ( e * e * n * p * tn * tn +
2 * e * e * n * p * tn * tp +
e * e * n * p * tp * tp +
2 * e * n * n * p * te * tn +
2 * e * n * n * p * te * tp +
N * e * n * n * tp * tp +
2 * e * n * p * p * te * tn +
2 * e * n * p * p * te * tp -
2 * N * e * n * p * tn * tp +
N * e * p * p * tn * tn +
n * n * n * p * te * te +
2 * n * n * p * p * te * te +
n * p * p * p * te * te)
These expressions are derived from MATLAB symbolic routine after simplification. It is clear in this case, it is not possible to simplify the algebraic expression by, e.g., merging factors. However, it seems quite possible to simplify this expression so that the actual number of operations are greatly reduced. Unfortunately, I am not able to find such options in MATLAB or Python.
Any help is appreciated.
EDIT The goal is to minimize the operations that CPU needs to perform for such expressions. Since the operations only involves addition and multiplication, I am hoping to something like (e+tn)*(te+tp)+n+.... I have tried to factor the expression but unfortunately the expressions are not factorizable.
If any python package can help, it would probably be Sympy :
from sympy import init_printing, symbols, simplify, collect, factor
e,n,p,tn,te,tp,N = symbols("e,n,p,tn,te,tp,N")
upperside = (e * e * n * p * tn * tn +
2 * e * e * n * p * tn * tp +
e * e * n * p * tp * tp +
2 * e * n * n * p * te * tn +
2 * e * n * n * p * te * tp +
N * e * n * n * tp * tp +
2 * e * n * p * p * te * tn +
2 * e * n * p * p * te * tp -
2 * N * e * n * p * tn * tp +
N * e * p * p * tn * tn +
n * n * n * p * te * te +
2 * n * n * p * p * te * te +
n * p * p * p * te * te)
print collect(upperside, e*n)
It outputs :
N*e*p**2*tn**2 +
e**2*n*(p*tn**2 + 2*p*tn*tp + p*tp**2) +
e*n**2*(N*tp**2 + 2*p*te*tn + 2*p*te*tp) +
e*n*(-2*N*p*tn*tp + 2*p**2*te*tn + 2*p**2*te*tp) +
n**3*p*te**2 +
2*n**2*p**2*te**2 +
n*p**3*te**2
Of all the methods described in this page, collect
looks the most promising.
Here's a quick and dirty way to iterate over all the combinations of symbols and display the shortest expression found :
from sympy import init_printing, symbols, collect, pprint
import itertools
init_printing()
e,n,p,tn,te,tp,big_n = symbols("e,n,p,tn,te,tp,big_n")
upperside = (e * e * n * p * tn * tn + 2 * e * e * n * p * tn * tp +
e * e * n * p * tp * tp + 2 * e * n * n * p * te * tn + 2 * e * n * n * p * te * tp +
big_n * e * n * n * tp * tp + 2 * e * n * p * p * te * tn +
2 * e * n * p * p * te * tp - 2 * big_n * e * n * p * tn * tp + big_n * e * p * p * tn * tn +
n * n * n * p * te * te + 2 * n * n * p * p * te * te + n * p * p * p * te * te)
my_symbols = [e, n, p, tn, te, tp, big_n]
min_length = float('inf')
for i in range(len(my_symbols)):
for symbol_subsets in itertools.combinations(my_symbols, i+1):
collect_by = '*'.join(str(symbol) for symbol in symbol_subsets)
expression = collect(upperside, collect_by)
length = len(str(expression))
if length < min_length:
min_length = length
print "With '%s' :" % collect_by
pprint(expression)
print
It outputs :
With 'e' :
e**2*(n*p*tn**2 + 2*n*p*tn*tp + n*p*tp**2) + e*(big_n*n**2*tp**2 - 2*big_n*n*p*tn*tp + big_n*p**2*tn**2 + 2*n**2*p*te*tn + 2*n**2*p*te*tp + 2*n*p**2*te*tn + 2*n*p**2*te*tp) + n**3*p*te**2 + 2*n**2*p**2*te**2 + n*p**3*te**2
With 'n' :
big_n*e*p**2*tn**2 + n**3*p*te**2 + n**2*(big_n*e*tp**2 + 2*e*p*te*tn + 2*e*p*te*tp + 2*p**2*te**2) + n*(-2*big_n*e*p*tn*tp + e**2*p*tn**2 + 2*e**2*p*tn*tp + e**2*p*tp**2 + 2*e*p**2*te*tn + 2*e*p**2*te*tp + p**3*te**2)
With 'e*n' :
big_n*e*p**2*tn**2 + e**2*n*(p*tn**2 + 2*p*tn*tp + p*tp**2) + e*n**2*(big_n*tp**2 + 2*p*te*tn + 2*p*te*tp) + e*n*(-2*big_n*p*tn*tp + 2*p**2*te*tn + 2*p**2*te*tp) + n**3*p*te**2 + 2*n**2*p**2*te**2 + n*p**3*te**2
With 'e*n*p' :
big_n*e*n**2*tp**2 - 2*big_n*e*n*p*tn*tp + big_n*e*p**2*tn**2 + e**2*n*p*(tn**2 + 2*tn*tp + tp**2) + e*n**2*p*(2*te*tn + 2*te*tp) + e*n*p**2*(2*te*tn + 2*te*tp) + n**3*p*te**2 + 2*n**2*p**2*te**2 + n*p**3*te**2