Search code examples
pythonmatrixscipysparse-matrix

Sparse matrix with filled row and diagonal


I have a sparse matrix that I am constructing using several diagonals:

A = diags([np.arange(100), np.arange(99), np.arange(99)], offsets=[0, -1, 1])

However, this sparse matrix also has a vector of ones in the last row. Is there any way to store this in the sparse matrix or is my construct to inefficient and I should work with a dense matrix?


Solution

  • sparse.diags does make a sparse matrix with a special diagonal format:

    In [591]: A = sparse.diags([np.arange(100), np.arange(99), np.arange(99)], offse
         ...: ts=[0, -1, 1])
    In [592]: A
    Out[592]: 
    <100x100 sparse matrix of type '<class 'numpy.float64'>'
        with 298 stored elements (3 diagonals) in DIAgonal format>
    In [593]: A.A
    Out[593]: 
    array([[  0.,   0.,   0., ...,   0.,   0.,   0.],
           [  0.,   1.,   1., ...,   0.,   0.,   0.],
           [  0.,   1.,   2., ...,   0.,   0.,   0.],
           ..., 
           [  0.,   0.,   0., ...,  97.,  97.,   0.],
           [  0.,   0.,   0., ...,  97.,  98.,  98.],
           [  0.,   0.,   0., ...,   0.,  98.,  99.]])
    

    But the storage isn't significantly more efficient than other sparse formats. Other formats have to store the same 298 values. They'll just index them differently.

    We can set the last row in various ways.

    We can't index the last row directly with the sparse format.

    In [594]: A[-1,:]
    ...
    TypeError: 'dia_matrix' object is not subscriptable
    

    but we can convert it to csr format, and set its row value:

    In [595]: A.tocsr()[-1,:]
    Out[595]: 
    <1x100 sparse matrix of type '<class 'numpy.float64'>'
        with 2 stored elements in Compressed Sparse Row format>
    In [596]: Ac = A.tocsr()
    In [597]: Ac[-1,:]=1
    /usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive.
    In [598]: Ac
    Out[598]: 
    <100x100 sparse matrix of type '<class 'numpy.float64'>'
        with 393 stored elements in Compressed Sparse Row format>
    In [599]: Ac.A
    Out[599]: 
    array([[  0.,   0.,   0., ...,   0.,   0.,   0.],
           [  0.,   1.,   1., ...,   0.,   0.,   0.],
           [  0.,   1.,   2., ...,   0.,   0.,   0.],
           ..., 
           [  0.,   0.,   0., ...,  97.,  97.,   0.],
           [  0.,   0.,   0., ...,  97.,  98.,  98.],
           [  1.,   1.,   1., ...,   1.,   1.,   1.]])
    

    Here I wouldn't worry about the sparsity warning; that's meant more for cases where the action is done iteratively. I could have used tolil() instead. Keep in mind that the csr format gets use for calculations. And the coo format is use when combining blocks of matrices.


    I just checked the sparse.dia_matrix code. For your array A.data is a (3,100) array. It 'squared off' your ragged input. A.offsets is a 3 element array.

    A.tocoo() stores the values in 3 (295,) arrays (removing the 3 0's in your definition). A A.tocsr() stores 2 (295,) arrays plus a (101,) indptr array. So the dia format is more compact, but only so long as you can work in the format.


    To append the ones row instead, use sparse.vstack (vstack uses coo format to construct the new matrix):

    In [619]: B = sparse.vstack((A,np.ones((1,100))))
    In [620]: B
    Out[620]: 
    <101x100 sparse matrix of type '<class 'numpy.float64'>'
        with 395 stored elements in COOrdinate format>
    

    Out of curiosity I tried vstack with a dia output - it doesn't like it, because the squared off dia data would be too large.

    In [621]: B = sparse.vstack((A,np.ones((1,100))),format='dia')
    /usr/local/lib/python3.5/dist-packages/scipy/sparse/coo.py:359: SparseEfficiencyWarning: Constructing a DIA matrix with 102 diagonals is inefficient
    

    Assignment with the lil format doesn't produce any warnings:

    In [624]: Al = A.tolil()
    In [625]: Al[-1,:]=1
    In [626]: Al
    Out[626]: 
    <100x100 sparse matrix of type '<class 'numpy.float64'>'
        with 393 stored elements in LInked List format>
    

    This too is converted to csr for most calculations.