In my research I'm trying to tackle the Kolmogorov backward equation, i.e. interested in $$Af = b(x)f'(x)+\sigma(x)f''(x)$$
With the specific b(x) and \sigma(x), I'm trying to see how fast the coefficients of the expression are growing when calculating higher Af powers. I'm struggle to derive this analytically thus tried to see the trend empirically.
First, I have used sympy
:
from sympy import *
import matplotlib.pyplot as plt
import re
import math
import numpy as np
import time
np.set_printoptions(suppress=True)
x = Symbol('x')
b = Function('b')(x)
g = Function('g')(x)
def new_coef(gamma, beta, coef_minus2, coef_minus1, coef):
return expand(simplify(gamma*coef_minus2 + beta*coef_minus1 + 2*gamma*coef_minus1.diff(x)\
+beta*coef.diff(x)+gamma*coef.diff(x,2)))
def new_coef_first(gamma, beta, coef):
return expand(simplify(beta*coef.diff(x)+gamma*coef.diff(x,2)))
def new_coef_second(gamma, beta, coef_minus1, coef):
return expand(simplify(beta*coef_minus1 + 2*gamma*coef_minus1.diff(x)\
+beta*coef.diff(x)+gamma*coef.diff(x,2)))
def new_coef_last(gamma, beta, coef_minus2):
return lambda x: gamma(x)*coef_minus2(x)
def new_coef_last(gamma, beta, coef_minus2):
return expand(simplify(gamma*coef_minus2 ))
def new_coef_second_to_last(gamma, beta, coef_minus2, coef_minus1):
return expand(simplify(gamma*coef_minus2 + beta*coef_minus1 + 2*gamma*coef_minus1.diff(x)))
def set_to_zero(expression):
expression = expression.subs(Derivative(b, x, x, x), 0)
expression = expression.subs(Derivative(b, x, x), 0)
expression = expression.subs(Derivative(g, x, x, x, x), 0)
expression = expression.subs(Derivative(g, x, x, x), 0)
return expression
def sum_of_coef(expression):
sum_of_coef = 0
for i in str(expression).split(' + '):
if i[0:1] == '(':
i = i[1:]
integers = re.findall(r'\b\d+\b', i)
if len(integers) > 0:
length_int = len(integers[0])
if i[0:length_int] == integers[0]:
sum_of_coef += int(integers[0])
else:
sum_of_coef += 1
else:
sum_of_coef += 1
return sum_of_coef
power = 6
charar = np.zeros((power, power*2), dtype=Symbol)
coef_sum_array = np.zeros((power, power*2))
charar[0,0] = b
charar[0,1] = g
coef_sum_array[0,0] = 1
coef_sum_array[0,1] = 1
for i in range(1, power):
#print(i)
for j in range(0, (i+1)*2):
#print(j, ':')
#start_time = time.time()
if j == 0:
charar[i,j] = set_to_zero(new_coef_first(g, b, charar[i-1, j]))
elif j == 1:
charar[i,j] = set_to_zero(new_coef_second(g, b, charar[i-1, j-1], charar[i-1, j]))
elif j == (i+1)*2-2:
charar[i,j] = set_to_zero(new_coef_second_to_last(g, b, charar[i-1, j-2], charar[i-1, j-1]))
elif j == (i+1)*2-1:
charar[i,j] = set_to_zero(new_coef_last(g, b, charar[i-1, j-2]))
else:
charar[i,j] = set_to_zero(new_coef(g, b, charar[i-1, j-2], charar[i-1, j-1], charar[i-1, j]))
#print("--- %s seconds for expression---" % (time.time() - start_time))
#start_time = time.time()
coef_sum_array[i,j] = sum_of_coef(charar[i,j])
#print("--- %s seconds for coeffiecients---" % (time.time() - start_time))
coef_sum_array
Then, looked into automated differentiation and used autograd:
import autograd.numpy as np
from autograd import grad
import time
np.set_printoptions(suppress=True)
b = lambda x: 1 + x
g = lambda x: 1 + x + x**2
def new_coef(gamma, beta, coef_minus2, coef_minus1, coef):
return lambda x: gamma(x)*coef_minus2(x) + beta(x)*coef_minus1(x) + 2*gamma(x)*grad(coef_minus1)(x)\
+beta(x)*grad(coef)(x)+gamma(x)*grad(grad(coef))(x)
def new_coef_first(gamma, beta, coef):
return lambda x: beta(x)*grad(coef)(x)+gamma(x)*grad(grad(coef))(x)
def new_coef_second(gamma, beta, coef_minus1, coef):
return lambda x: beta(x)*coef_minus1(x) + 2*gamma(x)*grad(coef_minus1)(x)\
+beta(x)*grad(coef)(x)+gamma(x)*grad(grad(coef))(x)
def new_coef_last(gamma, beta, coef_minus2):
return lambda x: gamma(x)*coef_minus2(x)
def new_coef_second_to_last(gamma, beta, coef_minus2, coef_minus1):
return lambda x: gamma(x)*coef_minus2(x) + beta(x)*coef_minus1(x) + 2*gamma(x)*grad(coef_minus1)(x)
power = 6
coef_sum_array = np.zeros((power, power*2))
coef_sum_array[0,0] = b(1.0)
coef_sum_array[0,1] = g(1.0)
charar = [b, g]
for i in range(1, power):
print(i)
charar_new = []
for j in range(0, (i+1)*2):
if j == 0:
new_funct = new_coef_first(g, b, charar[j])
elif j == 1:
new_funct = new_coef_second(g, b, charar[j-1], charar[j])
elif j == (i+1)*2-2:
new_funct = new_coef_second_to_last(g, b, charar[j-2], charar[j-1])
elif j == (i+1)*2-1:
new_funct = new_coef_last(g, b, charar[j-2])
else:
new_funct = new_coef(g, b, charar[j-2], charar[j-1], charar[j])
coef_sum_array[i,j] = new_funct(1.0)
charar_new.append(new_funct)
charar = charar_new
coef_sum_array
However, I'm so not happy with the speed of either of them. I would like to do at least thousand iterations while after 3 days of running simpy method, I got 30 :/
I expect that the second method (numerical) could be optimized to avoid recalculating expressions every time. Unfortunately, I cannot see that solution myself. Also, I have tried Maple, but again without luck.
So, there are two formulas about derivatives that are interesting here:
f(g(x))
, and looks a lot like the Multinomial theoremf(x)*g(x)
and looks a lot like the Binomial theoremBoth of these have been discussed in pull request #13892 the n-th derivative was sped up using the general Leibniz rule.
I'm trying to see how fast the coefficients of the expression are growing
In your code, the general formula for computing c[i][j]
is this:
c[i][j] = g * c[i-1][j-2] + b * c[i-1][j-1] + 2 * g * c'[i-1][j-1] + g * c''[i-1][j]
(where by c'[i][j]
and c''[i][j]
are the 1st and 2nd derivatives of c[i][j]
)
Because of this, and by the Leibniz rule mentioned above, I think intuitively, the coefficients computed should be related to Pascal's triangle (or at the very least they should have some combinatorial relation).
In the original code, the function sum_to_coef(f)
serializes the expression f
to a string and then discarding everything that doesn't look like a number, and then sums the remaining numbers.
We can avoid serialization here by just traversing the expression tree and collecting what we need
def sum_of_coef(f):
s = 0
if f.func == Add:
for sum_term in f.args:
res = sum_term if sum_term.is_Number else 1
if len(sum_term.args) == 0:
s += res
continue
first = sum_term.args[0]
if first.is_Number == True:
res = first
else:
res = 1
s += res
elif f.func == Mul:
first = f.args[0]
if first.is_Number == True:
s = first
else:
s = 1
elif f.func == Pow:
s = 1
return s
In the function set_to_zero(expr)
all the 2nd and 3rd derivatives of b
, and the 3rd and 4th derivatives of g
are replaced by zero.
We can collapse all those substitutions into one statement like so:
b3,b2=b.diff(x,3),b.diff(x,2)
g4,g3=g.diff(x,4),g.diff(x,3)
def set_to_zero(expression):
expression = expression.subs({b3:0,b2:0,g4:0,g3:0})
return expression
In the original code, for every cell c[i][j]
we're calling simplify
. This turns out to have a big impact on performance but actually we can skip this call, because fortunately our expressions are just sums of products of derivatives or unknown functions.
So the line
charar[i,j] = set_to_zero(expand(simplify(expr)))
becomes
charar[i,j] = set_to_zero(expand(expr))
The following was also tried but turned out to have very little impact.
For two consecutive values of j, we're computing c'[i-1][j-1]
twice.
j-1 c[i-1][j-3] c[i-1][j-2] c[i-1][j-1]
j c[i-1][j-2] c[i-1][j-1] c[i-1][j]
If you look at the loop formula in the else
branch, you see that c'[i-1][j-1]
has already been computed. It can be cached, but this optimization
has little effect in the SymPy version of the code.
Here it's also important to mention that it's possible to visualize the call tree of SymPy involved in computing these derivatives. It's actually larger, but here is part of it:
We can also generate a flamegraph using the py-spy module just to see where time is being spent:
As far as I could tell, 34% of the time spent in _eval_derivative_n_times
, 10% of the time spent in the function getit
from assumptions.py
, 12% of the time spent in subs(..)
, 12% of the time spent in expand(..)
Apparently when pull request #13892 was merged into SymPy, it also introduced a performance regression.
In one of the comments regarding that regression, Ondrej Certik recommends using SymEngine to improve performance of code that makes heavy use of derivatives.
So I've ported the code mentioned to SymEngine.py and noticed that it runs 98 times faster than the SymPy version for power=8
( and 4320 times faster for power=30
)
The required module can be installed via pip3 install --user symengine
.
#!/usr/bin/python3
from symengine import *
import pprint
x=var("x")
b=Function("b")(x)
g=Function("g")(x)
b3,b2=b.diff(x,3),b.diff(x,2)
g4,g3=g.diff(x,4),g.diff(x,3)
def set_to_zero(e):
e = e.subs({b3:0,b2:0,g4:0,g3:0})
return e
def sum_of_coef(f):
s = 0
if f.func == Add:
for sum_term in f.args:
res = 1
if len(sum_term.args) == 0:
s += res
continue
first = sum_term.args[0]
if first.is_Number == True:
res = first
else:
res = 1
s += res
elif f.func == Mul:
first = f.args[0]
if first.is_Number == True:
s = first
else:
s = 1
elif f.func == Pow:
s = 1
return s
def main():
power = 8
charar = [[0] * (power*2) for x in range(power)]
coef_sum_array = [[0] * (power*2) for x in range(power)]
charar[0][0] = b
charar[0][1] = g
init_printing()
for i in range(1, power):
jmax = (i+1)*2
for j in range(0, jmax):
c2,c1,c0 = charar[i-1][j-2],charar[i-1][j-1],charar[i-1][j]
#print(c2,c1,c0)
if j == 0:
expr = b*c0.diff(x) + g*c0.diff(x,2)
elif j == 1:
expr = b*c1 + 2*g*c1.diff(x) + b*c0.diff(x) + g*c0.diff(x,2)
elif j == jmax-2:
expr = g*c2 + b*c1 + 2*g*c1.diff(x)
elif j == jmax-1:
expr = g*c2
else:
expr = g*c2 + b*c1 + 2*g*c1.diff(x) + b*c0.diff(x) + g*c0.diff(x,2)
charar[i][j] = set_to_zero(expand(expr))
coef_sum_array[i][j] = sum_of_coef(charar[i][j])
pprint.pprint(Matrix(coef_sum_array))
main()
I think it would be very interesting to look at the number of terms in c[i][j]
to determine how quickly the expressions are growing. That would definitely help in estimating the complexity of the current code.
But for practical purposes I've plotted the current time and memory consumption of the SymEngine code above and managed to get the following chart:
Both the time and the memory seem to be growing polynomially with the input (the power
parameter in the original code).
The same chart but as a log-log plot can be viewed here:
Like the wiki page says, a straight line on a log-log plot corresponds to a monomial. This offers a way to recover the exponent of the monomial.
So if we consider two points N=16 and N=32 between which the log-log plot looks like a straight line
import pandas as pd
df=pd.read_csv("modif6_bench.txt", sep=',',header=0)
def find_slope(col1,col2,i1,i2):
xData = df[col1].to_numpy()
yData = df[col2].to_numpy()
x0,x1 = xData[i1],xData[i2]
y0,y1 = yData[i1],yData[i2]
m = log(y1/y0)/log(x1/x0)
return m
print("time slope = {0:0.2f}".format(find_slope("N","time",16,32)))
print("memory slope = {0:0.2f}".format(find_slope("N","memory",16,32)))
Output:
time slope = 5.69
memory slope = 2.62
So very rough approximation of time complexity would be O(n^5.69)
and an approximation of space complexity would be O(2^2.62)
.
There are more details about deciding whether the growth rate is polynomial or exponential here (it involves drawing a semi-log and a log-log plot, and seeing where the data shows up as a straight line).
b
and g
functionsIn the first original code block, the functions b
and g
were undefined functions. This means SymPy and SymEngine didn't know anything about them.
The 2nd original code block defines b=1+x
and g=1+x+x**2
. If we run all of this again with known b
and g
the code runs much faster, and the running time curve and the memory usage curves are better than with unknown functions
time slope = 2.95
memory slope = 1.35
I wanted to look a bit more into matching the observed resource consumption(time and memory), so I wrote the following Python module that fits each growth rate (from a catalog of common such growth rates) to the recorded data, and then shows the plot to the user.
It can be installed via pip3 install --user matchgrowth
When run like this:
match-growth.py --infile ./tests/modif7_bench.txt --outfile time.png --col1 N --col2 time --top 1
It produces graphs of the resource usage, as well as the closest growth rates it matches to. In this case, it finds the polynomial growth to be closest:
If you run this for power=8
(in the symengine code mentioned above) the coefficients will look like this:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 5, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 17, 40, 31, 9, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 53, 292, 487, 330, 106, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 161, 1912, 6091, 7677, 4693, 1520, 270, 25, 1, 0, 0, 0, 0, 0, 0]
[1, 485, 11956, 68719, 147522, 150706, 83088, 26573, 5075, 575, 36, 1, 0, 0, 0, 0]
[1, 1457, 73192, 735499, 2568381, 4118677, 3528928, 1772038, 550620, 108948, 13776, 1085, 49, 1, 0, 0]
[1, 4373, 443524, 7649215, 42276402, 102638002, 130209104, 96143469, 44255170, 13270378, 2658264, 358890, 32340, 1876, 64, 1]
So as it turns out, the 2nd column coincides with A048473 which according to OEIS is "The number of triangles (of all sizes, including holes) in Sierpiński's triangle after n inscriptions".
All the code for this is also available in this repo.