Search code examples
pythonconstructorscipysparse-matrix

Constructing sparse matrix from list of lists of tuples


I have a Python list of row information for a sparse matrix. Each row is represented as a list of (column, value) tuples. Call it alist:

alist = [[(1,10), (3,-3)],
         [(2,12)]]

How can I efficiently construct a scipy sparse matrix from this list of lists, resulting in a matrix like this:

0  10   0  -3
0   0  12   0

The obvious approach is to make a scipy.sparse.lil_matrix, which internally has this "list of lists" structure. But from the scipy.sparse.lil_matrix — SciPy v0.19.0 Reference Guide I see just three ways of constructing them:

  • starting from a dense array
  • starting from another sparse array
  • just constructing an empty array

So the only way to get fresh data in is either to solve this problem with some other sparse matrix representation, or to start with a dense array, neither of which address the initial problem, and both of which seem likely to be less efficient representations than lil_matrix itself for this data.

I guess I can make an empty one, and use a loop to add values, but surely I'm missing something.

The scipy documentation is really frustrating when it comes to sparse matrices.


Solution

  • Your data layout is an unusual one. Here's my first stab at using it.

    In [565]: M = sparse.lil_matrix((2,4), dtype=int)
    In [566]: M
    Out[566]: 
    <2x4 sparse matrix of type '<class 'numpy.int32'>'
        with 0 stored elements in LInked List format>
    In [567]: for i,row in enumerate(alist):
         ...:     for col in row:
         ...:         M[i, col[0]] = col[1]
         ...:         
    In [568]: M
    Out[568]: 
    <2x4 sparse matrix of type '<class 'numpy.int32'>'
        with 3 stored elements in LInked List format>
    In [569]: M.A
    Out[569]: 
    array([[ 0, 10,  0, -3],
           [ 0,  0, 12,  0]])
    

    Yes, it is iterative; and lil is the best format for that purpose.

    Or using the common coo style of inputs:

    In [580]: data,col,row = [],[],[]
    In [581]: for i, rr in enumerate(alist):
         ...:     for cc in rr:
         ...:         row.append(i)
         ...:         col.append(cc[0])
         ...:         data.append(cc[1])
         ...:         
    In [582]: data,col,row
    Out[582]: ([10, -3, 12], [1, 3, 2], [0, 0, 1])
    In [583]: M1=sparse.coo_matrix((data,(row,col)),shape=(2,4))
    In [584]: M1
    Out[584]: 
    <2x4 sparse matrix of type '<class 'numpy.int32'>'
        with 3 stored elements in COOrdinate format>
    In [585]: M1.A
    Out[585]: 
    array([[ 0, 10,  0, -3],
           [ 0,  0, 12,  0]])
    

    Another option is to create the blank lil matrix, and directly fill in its attributes:

    In other words, start with:

    In [591]: m.data
    Out[591]: array([[], []], dtype=object)
    In [592]: m.rows
    Out[592]: array([[], []], dtype=object)
    

    and change them to:

    In [587]: M.data
    Out[587]: array([[10, -3], [12]], dtype=object)
    In [588]: M.rows
    Out[588]: array([[1, 3], [2]], dtype=object)
    

    It would still require the 2 level iteration on your alist structure.

    In [593]: for i, rr in enumerate(alist):
         ...:     for cc in rr:
         ...:         m.rows[i].append(cc[0])
         ...:         m.data[i].append(cc[1])       
    In [594]: m
    Out[594]: 
    <2x4 sparse matrix of type '<class 'numpy.int32'>'
        with 3 stored elements in LInked List format>
    In [595]: m.A
    Out[595]: 
    array([[ 0, 10,  0, -3],
           [ 0,  0, 12,  0]])
    

    In another comment you mentioned the difficulty in understanding the csr indptr. The easiest way to get that is to convert one these formats:

    In [597]: Mr=M.tocsr()
    In [598]: Mr.indptr
    Out[598]: array([0, 2, 3], dtype=int32)
    In [599]: Mr.data
    Out[599]: array([10, -3, 12])
    In [600]: Mr.indices
    Out[600]: array([1, 3, 2], dtype=int32)