Vectorisation of a numpy matrix

I'm working on numerical option pricing, and am now trying to implement vectorised inputs for performance reasons. The function I want to vectorise for u-inputs in the GBM class is as follows.

def spitzer_recurrence(self, m : int, u, x, t, T, K, num=False):
        p = np.zeros((m, m), dtype=complex)
        for I in range(m):
            p[0,I] = self.ana_a(I, u, t, (T-t)/m) / (I+1)
        
        out = p[0, m-1]
        
        for I in range(1, m):
            for J in range(m - I):
                for k in range(J+1):
                    p[I, J] += p[I-1, k] * p[0, J-k]
            out += p[I, m-1-I] / math.factorial(I+1)

        return out

where ana_a() is defined as

def ana_a(self, k, u, t, stepsize):
        tau_k = k * stepsize
        alpha = self.mu - self.sigma**2/2

        t1 = norm.cdf(-alpha*math.sqrt(tau_k)/(self.sigma))
        f1 = 0.5*np.exp(-0.5*u**2*self.sigma**2*tau_k + 1j*tau_k*u*alpha)
        f2 = erfc(-math.sqrt(tau_k/2) * (u*self.sigma*1j + alpha/self.sigma))
        return t1 + f1 * f2

Trying to input a np array into this fuction for u gives an error in the line p[0,I] = self.ana_a(I, u, t, (T-t)/m) / (I+1), for trying to insert a np array into a matrix of scalars.

Is there a way to initialise p as a matrix of zero valued vectors?

TEST PROGRAM

xs = np.linspace(-10, 10, 1000)
gbm = GBM(0.1, 0.2)
a1 = gbm.spitzer_recurrence(20, xs, 100, 0, 1, 100)
a2 = np.array([gbm.spitzer_recurrence(20, x, 100, 0, 1, 100) for x in xs])

print(a1 == a2)

running this script should result in an array of True, but does not.

Solution

My own MRE

from scipy.stats import norm
from scipy.special import erfc
import math
import numpy as np


def ana_a(k, u, t, stepsize):
    mu = 5.0
    sigma = 2.0
    tau_k = k * stepsize
    alpha = mu - sigma**2/2

    t1 = norm.cdf(-alpha*np.sqrt(tau_k)/sigma)
    f1 = 0.5*np.exp(-0.5*u**2*sigma**2*tau_k + 1j*tau_k*u*alpha)
    f2 = erfc(-np.sqrt(tau_k/2) * (u*sigma*1j + alpha/sigma))
    return t1 + f1 * f2

def spitzer_recurrence(m : int, u, x, t, T, K, num=False):
    p = np.zeros((m, m)+np.shape(u), dtype=complex)
    for I in range(m):
        p[0,I] = ana_a(I, u, t, (T-t)/m) / (I+1)
    
    out = p[0, m-1]
    
    for I in range(1, m):
        for J in range(m - I):
            for k in range(J+1):
                p[I, J] += p[I-1, k] * p[0, J-k]
        out += p[I, m-1-I] / math.factorial(I+1)

    return out

# Test that ana_a is vectorized
U=np.array([0,5,10])
r1 = np.array([ana_a(1,u,2.0, 0.01) for u in U])
r2 = ana_a(1,U,2.0,0.01)
if (r1==r2).all():
    print('ana_a is vectorized')
else:
    print('ana_a is NOT vectorized')


xs = np.linspace(-10, 10, 1000)
a1 = spitzer_recurrence(20, xs, 100, 0, 1, 100)
a2 = np.array([spitzer_recurrence(20, x, 100, 0, 1, 100) for x in xs])
print('spitzed_recurrence vectorized', (a1==a2).all())

Note "R" stands for "reproducible" and that it is reproducible, because anyone can just copy it at home, and execute it, without having to figure out what those undefined symbols are. So it includes the import part (some folks here are certainly good enough to solve your problem, while not aware that norm.cdf is probably scipy.stats.norm.cdf. After all, you can be very well accustomed to use numpy without doing any stats. Any way, even for me, who have already used those functions more than once, most of the time I spent to solve your problem was done in doing what needed so that norm.cdf and erfc works.

Likewise, you didn't provide the class. And you were right not to do so (that is the "M" of "MRE"). Class is not what cause problem, so no need to bother us with its definition. But then, what you provide should not rely on that missing class. So, no GBM in your test code, and no self anyway. And just hard code some values for mu and sigma. It is not as those value really matter; as long as they are as such as the function does return something usable to test it.

In fact, even my MRE is not minimal enough. stepsize, t, m, x (the one in the function, not the one you use for testing, which seems to be what is called u in the function), t, K, num could be all hard coded. Their value is not part of the problem. So all value but u aren't really necessary here. Even further: since you trust that ana_a is vectorized (which it is only partly, since you use math.sqrt instead of np.sqrt. But as long as only u can be a vector, that is not a problem. I replaced them with np.sqrt tho)

So, since most of my time was spend into filling the holes here, I took the liberty to also use most of my answer ranting on MRE :)

Vectorization of `spitzer_recurrence`

In first approach, the solution is simply to do exactly what you asked for: having p being a matrix of vectors instead of a matrix of scalar. And a matrix (2d array) of vectors (1d-array), is simply a 3d-arrary. So, it is just about replacing p=np.zeros((m,m)) by p=np.zeros((m,m,len(u))

The only difficulty here is that we want the function to still work with scalars (most of the time, and probably in your case, we don't really need this. We know in advance what parameter is scalar what which are not. People who are writing general purpose library, such as numpy, want usage to be quite multiform. But your function seems specific enough to suspect that in reality it would not be a problem to decide once for all that parameter u is a vector. But well, since your own test assumes that it can be both...)

So for that, I need p to be 2d if u is a scalar, and 3d if u is a 1d-vector. Good news is that np.shape can compute the shape of both an array and a scalar (which is, from numpy vectorized perspective a 0d-array, so shape=()). So shape of p is simply (m,m)+np.shape(u). If is a scalar that is (m,m)+() aka (m,m). If u is a 1d-array of size n, that is (m,m)+(n,) aka (m,m,n).

More vectorization

This is vectorized, that is, it does accept a vector as input. But there are still for loops in it. To benefits numpy's vectorization power, the ideal would be to remove all for loops from the code.

Note that even if you have a triple loop, there is not much to gain (not much more: we already spared a times ×200 factor with the simple previous vectorization (which was just about having p being 3D). Because when an algorithm consist essentially in a nested loop, the important loops to vectorized are the inner one. If your have 3 nested, 1000 iteration for loops, the inner one means to have some python calls 1000×1000×1000, the middle one 1000×1000, and the first one 1000. So, usually there is not so much to gain once the 1 or 2 most inner loops have been removed (depending on how negligible is the actual computation time compared to the overhead of the for loop itself, and of the python cost to call a function)

In your case, you have 2 loops for the 1st part of the computation (the one calling ana_a: one explicit, and one implicit, done inside ana_a by the vectorization we've done before).

Then 4 nested loops in the second part of the computation; 3 explicit, and 1 implicit, because of the vectorization on the 3rd axis of the p[I-1, k] * p[0, J-k] computation.

So in both case, the most inner loop is already vectorized (it is implicit; it is done inside numpy code, that computes the same thing for all values in the 3rd axis, that is for all u).

Nevertheless, we remove 1 more loop from each of those computations.

def spitzer2(m, u, x, t, T, K, num=False):
    p = np.zeros((m, m)+np.shape(u), dtype=complex)
    II = np.arange(m)[:,None]
    p[0,...] = (ana_a(II, u, t, (T-t)/m) / (II+1)).squeeze()
    
    out = p[0, m-1]
    
    for I in range(1, m):
        for J in range(m - I):
            p[I,J] = (p[I-1,:J+1] * p[0,J::-1]).sum(axis=0)
        out += p[I, m-1-I] / math.factorial(I+1)

    return out

Testing with

a3 = spitzer2(20, xs, 100, 0, 1, 100)
a4 = np.array([spitzer2(20, x, 100, 0, 1, 100) for x in xs])
print('same with scalar', np.allclose(a1,a3))
print('same with vector', np.allclose(a2,a4))

The trick here, for the 1st computation, is to have ana_a be able to vectorize for two parameters. k and u. Hence the importance, after all, to change your math.sqrt to np.sqrt: it didn't really matter when only u was non scalar, but if k, therefore tau_k can also be a vector, that becomes more important. But since we want a double loop, it cannot be just vectors (otherwise it would compute ana_a with first k and first u, then second k and second u, then third k and third u, ... or even don't compute anything if the size of U vector is not the same as the size of the vector of k). We need to have 2 axis, one for the values of u, and another for the values of k. Then numpy broadcasting will do the rest, and compute ana_a for all possible combination of a k and a u, and return a 2d array of the corresponding result.

This is what np.arange(I)[:,None] is for: it is a vector of all possible values for k, but with 2 axes.

The .squeeze is because, in the case we call ana_a with a scalar, we expect a 1D-array (to fit in p[0]), but the division by (II+1) creates a 2D-array anyway (even if only with 1 row). .squeeze removes axis with only 1 row.

For the second part, we can (at least easily; some people may find a way to do more) remove the innner for k loop. That is after all just the sum of a element-wise multiplication; at least it is, if you invert p on its 2nd axis first.

Timings

On my (slow) PC

Method	execution time
Non vectorized	5.81 seconds
simple vectorization (3d p)	29 ms
2d ana + vectorization of for k loop	20 ms

So, obviously the 1st vectorization was the important one (×200 gain, aka -99.5% reduction). But the second one still worth it (still a ×1.5 gain, or a -30% reduction)

Vectorisation of a numpy matrix

My own MRE

Vectorization of spitzer_recurrence

More vectorization

Timings

Vectorization of `spitzer_recurrence`