What is the best way to do array operations when there are some repeated indices which are summed over AND others which are not? It seems like I may have to use einsum
for these operations, but it would be better if there was a tensordot
alternative with a flag for dimensions that are aligned but not summed.
Does anybody know of a fast numerical routine (in lapack perhaps?) that behaves like tensordot, except that some axes can be aligned without being summed?
==
Here is an example code to show type of array operation that is needed. The operation I need is done by method_sum
, method_einsum
, and method_matmul
. A similar operation sums over the matching j-axis and is done by method2_einsum
and method2_tensordot
.
By comparing times it seems that tensordot
should be able to beat einsum
on the first problem. However, it does not have the functionality to align axes without summing them up.
#import scipy
import scipy as sp
# Shapes of arrays
I = 200
J = 50
K = 200
L = 100
a = sp.ones((I, J, L))
b = sp.ones((J, K, L))
# The desired product has a sum over the l-axis
## Use broadcasting to multiply and sum over the last dimension
def method_sum(a, b):
"Multiply arrays and sum over last dimension."
c = (a[:, :, None, :] * b[None, :, :, :]).sum(-1)
return c
## Use einsum to multiply arrays and sum over the l-axis
def method_einsum(a, b):
"Multiply arrays and sum over last dimension."
c = sp.einsum('ijl,jkl->ijk', a, b)
return c
## Use matmul to multiply arrays and sum over one of the axes
def method_matmul(a, b):
"Multiply arrays using the new matmul operation."
c = sp.matmul(a[:, :, None, None, :],
b[None, :, :, :, None])[:, :, :, 0, 0]
return c
# Compare einsum vs tensordot on summation over j and l
## Einsum takes about the same amount of time when j is not summed over)
def method2_einsum(a, b):
"Multiply arrays and sum over last dimension."
c = sp.einsum('ijl,jkl->ik', a, b)
return c
## Tensor dot can do this faster but it always sums over the aligned axes
def method2_tensordot(a, b):
"Multiply and sum over all overlapping dimensions."
c = sp.tensordot(a, b, axes=[(1, 2,), (0, 2,)])
return c
Here are some times for the various routines on my computer. Tensordot can beat einsum for method2
because it uses multiple cores. I would like to achieve performance like tensordot
for the calculation where both the J and L-axes are aligned, but only the L-axis is summed.
Time for method_sum:
1 loops, best of 3: 744 ms per loop
Time for method_einsum:
10 loops, best of 3: 95.1 ms per loop
Time for method_matmul:
10 loops, best of 3: 93.8 ms per loop
Time for method2_einsum:
10 loops, best of 3: 90.4 ms per loop
Time for method2_tensordot:
100 loops, best of 3: 10.9 ms per loop
In [85]: I,J,K,L=2,3,4,5
In [86]: a=np.ones((I,J,L))
In [87]: b=np.ones((J,K,L))
In [88]: np.einsum('ijl,jkl->ijk',a,b).shape
Out[88]: (2, 3, 4)
Playing around with the new @
operator, I find I can produce:
In [91]: (a[:,:,None,None,:]@b[None,:,:,:,None]).shape
Out[91]: (2, 3, 4, 1, 1)
In [93]: (a[:,:,None,None,:]@b[None,:,:,:,None])[...,0,0].shape
Out[93]: (2, 3, 4)
The shapes are right, but I haven't checked the values. Some of the None
line up the ijk
axes, and two produce the regular dot
behavior (last axis with 2nd to the last).
With your sizes the times are about the same:
In [97]: np.einsum('ijl,jkl->ijk',a,b).shape
Out[97]: (200, 50, 200)
In [98]: (a[:,:,None,None,:]@b[None,:,:,:,None])[...,0,0].shape
Out[98]: (200, 50, 200)
In [99]: timeit np.einsum('ijl,jkl->ijk',a,b).shape
1 loop, best of 3: 2 s per loop
In [100]: timeit (a[:,:,None,None,:]@b[None,:,:,:,None])[...,0,0].shape
1 loop, best of 3: 2.06 s per loop