Search code examples
pythonnumpymatrixindexingsparse-matrix

Indexing to access elements of a matrix


In the example below, I am creating idxL and I want to loop through its elements to carry out other operations. I'm trying to understand why idxL[0][0] returns [[ True False False False False]] instead of returning just True. idxL.item(0) seems to work. I think I could loop through the entire number of items in idxL using it. However, for some reason, I think it would not be as efficient when I start handling bigger arrays.

from scipy.sparse import csr_matrix
a=['foo','panda','donkey','bird','egg']
b='foo'
idxL=csr_matrix((1,5), dtype=bool)
idxTemp=np.array(list(map(lambda x: x in b, a)))
idxL = idxL + idxTemp
print(idxL[0][0])
print(idxL.item(0))

Solution

  • In [193]: from scipy import sparse                                              
    In [194]: a=['foo','panda','donkey','bird','egg'] 
         ...: b='foo' 
         ...: idxL=sparse.csr_matrix((1,5), dtype=bool) 
         ...: idxTemp=np.array(list(map(lambda x: x in b, a)))  
    

    The sparse matrix:

    In [195]: idxL                                                                  
    Out[195]: 
    <1x5 sparse matrix of type '<class 'numpy.bool_'>'
        with 0 stored elements in Compressed Sparse Row format>
    In [196]: idxL.A                                                                
    Out[196]: array([[False, False, False, False, False]])
    

    the dense array; note that it is 1d

    In [197]: idxTemp                                                               
    Out[197]: array([ True, False, False, False, False])
    

    Indexing the sparse matrix:

    In [198]: idxL[0,0]                                                             
    Out[198]: False
    

    The addition - it is a dense matrix now:

    In [199]: idxLL = idxL + idxTemp                                                
    In [200]: idxLL                                                                 
    Out[200]: matrix([[ True, False, False, False, False]])
    In [201]: idxLL[0,0]                                                            
    Out[201]: True
    

    [0] of a matrix selects the first row, but the result will still be 2d. [0][0] indexing doesn't help. This style of indexing works with 2d ndarray, but the [0,0] is generally better.

    In [202]: idxLL[0]                                                              
    Out[202]: matrix([[ True, False, False, False, False]])
    In [203]: idxTemp[0]                                                            
    Out[203]: True
    

    edit

    We can make a sparse matrix from idxTemp directly:

    In [257]: M = sparse.csr_matrix(idxTemp)                                        
    In [258]: M                                                                     
    Out[258]: 
    <1x5 sparse matrix of type '<class 'numpy.bool_'>'
        with 1 stored elements in Compressed Sparse Row format>
    In [259]: M.A                                                                   
    Out[259]: array([[ True, False, False, False, False]])
    In [260]: print(M)                                                              
      (0, 0)    True
    

    There's no need to add it to idxL. It could be added:

    In [261]: idxL+M                                                                
    Out[261]: 
    <1x5 sparse matrix of type '<class 'numpy.bool_'>'
        with 1 stored elements in Compressed Sparse Row format>
    

    I wouldn't recommend building a spare matrix by adding matrices.