Search code examples
pythonarraysnumpycovariancepoint-clouds

Numpy.Cov of a Large Nx3 Array Produces MemoryError


I have a large 2D array of size Nx3. This array contains point cloud data in (X,Y,Z) format. I am using Python in Ubuntu in a virtual environment to read data from a .ply file.

When I am trying to find the covariance of this array with rowvar set to True (meaning each row being considered a variable), I am getting MemoryError.

I understand that this is creating a very large array, apparently too large for my 8 Gb allocated memory to handle. Without increasing memory allocation, is there a different way of getting around this issue? Are there different methods of calculating the covariance matrix elements so that the memory is not overloaded?


Solution

  • You could chop it up in a loop and keep the upper triangle only.

    import numpy as np
    
    N = 23000
    a = np.random.random((N, 3))
    c = a - a.mean(axis=-1, keepdims=True)
    out = np.empty((N*(N+1) // 2,))
    def ravel_triu(i, j, n):
        i, j = np.where(i>j, np.broadcast_arrays(j, i), np.broadcast_arrays(i, j))
        return i*n - i*(i+1) // 2 + j
    def unravel_triu(k, n):
        i = n - (0.5 + np.sqrt(n*(n+1) - 2*k - 1)).astype(int)
        return i, k - (i*n - i*(i+1) // 2)
    ii, jj = np.ogrid[:N, :N]
    for j in range(0, N, 500):
        out[ravel_triu(j, j, N):ravel_triu(min(N, j+500), min(N, j+500), N)] \
            = np.einsum(
                'i...k,...jk->ij', c[j:j+500], c[j:]) [ii[j:j+500] <= jj[:, j:]]
    

    Obviously your covariances will be quite undersampled and the covariance matrix highly rank-defective...