I have a List of N items, each of d dimensions (so essentially a N x d list). For each of the items, I want to find the product of item transposed with itself, So, x.xT for each of the N items. This will give me a N x d x d array. How can I do it efficiently in numpy. At this moment, I am looping through each of the items and finding the transpose separately.
for i in range(len(mu[0])):
current_mu = mu[i] # list of d elements
distances = []
for index in range(len(samples)):
distance = np.asarray(current_mu - samples[index])[:, None] # list of d elements
distances.append(distance * distance.T) # each becomes d x d
Can I remove the second nested loop or is it required?
You can use numpy.einsum as follows:
import numpy as np
N,d = 10,5
mu = np.random.rand(N,d)
r = np.einsum('ni,nj->nij', mu, mu)
r.shape
(10,5,5)
Comparing to a for-loop implementation:
def for_loop(a):
N,d = a.shape
r = np.zeros((N,d,d))
for i in range(N):
r[i] = a[i][:,None] @ a[i][None,:]
# N>d case
N,d = 1000,500
mu = np.random.rand(N,d)
%timeit np.einsum('ni,nj->nij', mu, mu)
1.29 s ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit for_loop(mu)
2.36 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# N<d case
N,d = 100,1000
mu = np.random.rand(N,d)
%timeit np.einsum('ni,nj->nij', mu, mu)
521 ms ± 9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit for_loop(mu)
976 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In both cases resulting in almost 2x performances.