Search code examples
pythonnumpyscipysparse-matrix

Is Scipy's sign() not guaranteed working?


I have an adjacency matrix of a graph A. After A = A.sign() there are still some elements that are not 1 or 0 or -1.

In [35]: A = A.sign()

In [36]: A.getcol(0).data
Out[36]: 
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,    
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  2.])

In [37]: A
Out[37]: 
<519403x519403 sparse matrix of type '<type 'numpy.float64'>'
    with 3819116 stored elements in COOrdinate format>

On the other hand numpy.sign() works fine.

In [50]: a = A.getcol(0)

In [51]: np.sum(a.todense())
Out[51]: 58.0

In [52]: np.sum(np.sign(a.todense()))
Out[52]: 57.0

Solution

  • After some research I got the answer. It's all about the internal data structure Scipy uses.

    import numpy as np
    from scipy.sparse import coo_matrix
    
    xs = np.array([1, 2, 3, 3, 2])
    ys = np.array([2, 3, 1, 1, 1])
    A = coo_matrix((np.ones((5,)), (xs, ys)))
    

    At this point A is a <4x4 sparse matrix of type '<type numpy.float64'>' with 5 stored elements in COOrdinate format>, although we have two elements in the same coordinate (3, 1). And A = A.sign() only performs on the 5 elements, which are all 1 in the first place.

    >>> A.data
    array([ 1.,  1.,  1.,  1.,  1.])
    
    >>> A.todense()
    matrix([[ 0.,  0.,  0.,  0.],
            [ 0.,  0.,  1.,  0.],
            [ 0.,  1.,  0.,  1.],
            [ 0.,  2.,  0.,  0.]])
    
    >>> A = A.sign()
    >>> A.todense()
    matrix([[ 0.,  0.,  0.,  0.],
            [ 0.,  0.,  1.,  0.],
            [ 0.,  1.,  0.,  1.],
            [ 0.,  2.,  0.,  0.]])