arrays numpy matrix-multiplication tensor numpy-einsum

Parallelize a matmul-like matrix computation (tensor product) in Numpy

I want to implement a tensor Tucker product s.t.,

Input: B in shape (m, n), C in shape (n, n, n) (cube).
Output: Y in shape (m, m, m), s.t. Y_ijk = ∑_{0≤a,b,c<n} B_ia * B_jb * B_kc * C_abc.

The raw for loop code:

def raw_tensor_product(B, C, i, j, k):
    '''
    @param B: (m, n)
    @param C: (n, n, n)
    @param i: index in range(0, m)
    @param j: index in range(0, m)
    @param k: index in range(0, m)
    @return: the (i, j, k) entry of the product result Y
    '''
    m, n = B.shape
    return np.sum([B[i, a] * B[j, b] * B[k, c] * C[a, b, c] for a in range(n) for b in range(n) for c in range(n)])

Then how to write the equivalent code of above in numpy, which returns the (m, m, m) result Y directly? I tried something like tensordot and matmul.

Note that C's dimension can be generalized. If it's (n, n), then the result would be B.dot(cube).dot(B.T). Also e.g., if it's (n, n, n, n), the result would be an (m, m, m, m) tensor - then how to implement it?

And further, could you recommend some materials where I could learn such technologies? Thanks.

Solution

Here is a way:

np.tensordot(np.tensordot(np.tensordot(C, B, (0, 1)), B, (0, 1)), B, (0, 1))

But I'm not sure whether this would be the most efficient/elegent method.