Search code examples
arraysnumpynumpy-einsum

transpose 3D array and multiply elementwise-memory contiguity


I have a huge 3D array that looks like A.shape = (100000, 5000, 50). I need to transpose it to have an array of the form A.shape = (50, 5000, 100000). Then I need to do the operation a = a.T @ a on each of the 50 matrices contained in A. This gives me a 3D array of the form A.shape = (50, 5000, 5000).

If I do this with A.transpose(2, 1, 0) @ A.transpose(2, 0, 1) the single matrix multiplications a = a.T @ a turn out to be a thousand times slower than the case where a were not extracted from A.

The problem is that after transposing, the 3D array is not contiguous. I tried use np.ascontiguousarray() or copy() after transposing. It improves but it is still slower and it spends quite some time for copying.

Could any one suggest a better choice ? In particular I am trying to use np.einsum but I could not.


Solution

  • You can try the following:

    A = ...
    b = np.einsum('jki,jli->ikl', A, A)
    print(b.shape)
    # (50, 5000, 5000)