Search code examples
pythonnumpynumba

How to perform fast tensor contraction with two tensors and a vector?


I'm using numpy (ideally Numba) to perform a tensor contraction that involves three tensors, one of which is a vector that should multiply only one index of the others. For example,

A = np.random.normal(size=(20,20,20,20))
B = np.random.normal(size=(20,20,20,20))
v = np.sqrt(np.arange(20))

# e.g. v on the 3rd index
>>> %timeit np.vdot(A * v[None, None, :, None], B)
125 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

compare with

C = np.random.normal(size=(20,20,20,20))

>>> %timeit np.vdot(A * C, B)
76.8 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Is there a more efficient way of including the product with v? It feels wrong that it should be slower than multiplying by the full tensor C.


Solution

  • I could squeeze some performance by using numba with parallel=True

    import numba as nb
    import numpy as np
    
    N = 50
    
    @nb.njit('float64(float64[:,:,:,:], float64[:,:,:,:],float64[:])',parallel=True)
    def dotABv(a, b,vv):
        res = 0.0
        for i in nb.prange(a.shape[0]):
            for j in range(a.shape[1]):
                for k in range(a.shape[2]):
                    res += vv[k]*np.dot(a[i,j,k,:],b[i,j,k,:])
        return res
    
    v = np.sqrt(np.arange(N))
    A = np.random.normal(size=(N,N,N,N))
    B = np.random.normal(size=(N,N,N,N))
    C = np.random.normal(size=(N,N,N,N))
    
    %timeit dotABv(A,B,v)
    %timeit np.vdot(A , B) ## just to compare with dot
    %timeit np.vdot(A * v[None, None, :, None], B)
    # Output :
    # 501 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
    # 1.1 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
    # 14.7 ms ± 239 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    print(dotABv(A,B,v), np.vdot(A * v[None, None, :, None], B))
    # 5105.504508154087 5105.5045081541075