I have a huge 3D array that looks like A.shape = (100000, 5000, 50)
.
I need to transpose it to have an array of the form A.shape = (50, 5000, 100000)
.
Then I need to do the operation a = a.T @ a
on each of the 50 matrices contained in A.
This gives me a 3D array of the form A.shape = (50, 5000, 5000)
.
If I do this with A.transpose(2, 1, 0) @ A.transpose(2, 0, 1)
the single matrix multiplications
a = a.T @ a
turn out to be a thousand times slower than the case where a
were not extracted from A
.
The problem is that after transposing, the 3D array is not contiguous.
I tried use np.ascontiguousarray()
or copy()
after transposing. It improves but it is still slower and it spends quite some time for copying.
Could any one suggest a better choice ?
In particular I am trying to use np.einsum
but I could not.
You can try the following:
A = ...
b = np.einsum('jki,jli->ikl', A, A)
print(b.shape)
# (50, 5000, 5000)