python arrays numpy covariance point-clouds

Numpy.Cov of a Large Nx3 Array Produces MemoryError

I have a large 2D array of size Nx3. This array contains point cloud data in (X,Y,Z) format. I am using Python in Ubuntu in a virtual environment to read data from a .ply file.

When I am trying to find the covariance of this array with rowvar set to True (meaning each row being considered a variable), I am getting MemoryError.

I understand that this is creating a very large array, apparently too large for my 8 Gb allocated memory to handle. Without increasing memory allocation, is there a different way of getting around this issue? Are there different methods of calculating the covariance matrix elements so that the memory is not overloaded?

Solution

You could chop it up in a loop and keep the upper triangle only.

import numpy as np

N = 23000
a = np.random.random((N, 3))
c = a - a.mean(axis=-1, keepdims=True)
out = np.empty((N*(N+1) // 2,))
def ravel_triu(i, j, n):
    i, j = np.where(i>j, np.broadcast_arrays(j, i), np.broadcast_arrays(i, j))
    return i*n - i*(i+1) // 2 + j
def unravel_triu(k, n):
    i = n - (0.5 + np.sqrt(n*(n+1) - 2*k - 1)).astype(int)
    return i, k - (i*n - i*(i+1) // 2)
ii, jj = np.ogrid[:N, :N]
for j in range(0, N, 500):
    out[ravel_triu(j, j, N):ravel_triu(min(N, j+500), min(N, j+500), N)] \
        = np.einsum(
            'i...k,...jk->ij', c[j:j+500], c[j:]) [ii[j:j+500] <= jj[:, j:]]

Obviously your covariances will be quite undersampled and the covariance matrix highly rank-defective...