Search code examples
pythonnumpyscipysparse-matrix

How to test if two sparse arrays are (almost) equal?


I want to check if two sparse arrays are (almost) equal. Whereas for numpy arrays you can do:

import numpy as np

a = np.ones(200)
np.testing.assert_array_almost_equal(a, a)

This does not work for sparse arrays, which I can understand (either returns error AttributeError: ravel not found for smaller matrices or errors related to size of array). Is there a scipy equivalent to test sparse matrices? I could convert my sparse matrices are to dense matrices and use the numpy testing function, but sometimes this is not possible due to (memory/size) constraints. E.g.:

from scipy import sparse

b = sparse.rand(80000,8000,density=0.01)
type(b)  # <class 'scipy.sparse.coo.coo_matrix'>
c = b.toarray()  # ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

Is it possible to test these larger scipy arrays for equality, or should I test smaller samples?


Solution

  • Assuming that are we are not concerned with the non-zeros in one that array that might be within the tolerance value, we can simply get the row, col indices and the corresponding values and look for exact matches between the indices, while allclose() match for the values.

    Hence, the implementation would be -

    from scipy.sparse import find
    
    def allclose(A, B, atol = 1e-8):
    
        # If you want to check matrix shapes as well
        if np.array_equal(A.shape, B.shape)==0:
            return False
    
        r1,c1,v1 = find(A)
        r2,c2,v2 = find(B)
        index_match = np.array_equal(r1,r2) & np.array_equal(c1,c2)
    
        if index_match==0:
            return False
        else:  
            return np.allclose(v1,v2, atol=atol)
    

    Here's another with nonzero and data methods to replace find function -

    def allclose_v2(A, B, atol = 1e-8):
        # If you want to check matrix shapes as well
        if np.array_equal(A.shape, B.shape)==0:
            return False
    
        r1,c1 = A.nonzero()
        r2,c2 = B.nonzero()
    
        lidx1 = np.ravel_multi_index((r1,c1), A.shape)
        lidx2 = np.ravel_multi_index((r2,c2), B.shape)
    
        sidx1 = lidx1.argsort()
        sidx2 = lidx2.argsort()
    
        index_match = np.array_equal(lidx1[sidx1], lidx2[sidx2])
        if index_match==0:
            return False
        else:  
            v1 = A.data
            v2 = B.data        
            V1 = v1[sidx1]
            V2 = v2[sidx2]        
            return np.allclose(V1,V2, atol=atol)
    

    We can short-circuit at few places to speed it up further. On performance, I am focusing more at cases where only the values differ.