I have a Python list of row information for a sparse matrix. Each row is represented as a list of (column, value) tuples. Call it alist
:
alist = [[(1,10), (3,-3)],
[(2,12)]]
How can I efficiently construct a scipy sparse matrix from this list of lists, resulting in a matrix like this:
0 10 0 -3
0 0 12 0
The obvious approach is to make a scipy.sparse.lil_matrix
, which internally has this "list of lists" structure. But from the scipy.sparse.lil_matrix — SciPy v0.19.0 Reference Guide I see just three ways of constructing them:
So the only way to get fresh data in is either to solve this problem with some other sparse matrix representation, or to start with a dense array, neither of which address the initial problem, and both of which seem likely to be less efficient representations than lil_matrix
itself for this data.
I guess I can make an empty one, and use a loop to add values, but surely I'm missing something.
The scipy documentation is really frustrating when it comes to sparse matrices.
Your data layout is an unusual one. Here's my first stab at using it.
In [565]: M = sparse.lil_matrix((2,4), dtype=int)
In [566]: M
Out[566]:
<2x4 sparse matrix of type '<class 'numpy.int32'>'
with 0 stored elements in LInked List format>
In [567]: for i,row in enumerate(alist):
...: for col in row:
...: M[i, col[0]] = col[1]
...:
In [568]: M
Out[568]:
<2x4 sparse matrix of type '<class 'numpy.int32'>'
with 3 stored elements in LInked List format>
In [569]: M.A
Out[569]:
array([[ 0, 10, 0, -3],
[ 0, 0, 12, 0]])
Yes, it is iterative; and lil
is the best format for that purpose.
Or using the common coo
style of inputs:
In [580]: data,col,row = [],[],[]
In [581]: for i, rr in enumerate(alist):
...: for cc in rr:
...: row.append(i)
...: col.append(cc[0])
...: data.append(cc[1])
...:
In [582]: data,col,row
Out[582]: ([10, -3, 12], [1, 3, 2], [0, 0, 1])
In [583]: M1=sparse.coo_matrix((data,(row,col)),shape=(2,4))
In [584]: M1
Out[584]:
<2x4 sparse matrix of type '<class 'numpy.int32'>'
with 3 stored elements in COOrdinate format>
In [585]: M1.A
Out[585]:
array([[ 0, 10, 0, -3],
[ 0, 0, 12, 0]])
Another option is to create the blank lil
matrix, and directly fill in its attributes:
In other words, start with:
In [591]: m.data
Out[591]: array([[], []], dtype=object)
In [592]: m.rows
Out[592]: array([[], []], dtype=object)
and change them to:
In [587]: M.data
Out[587]: array([[10, -3], [12]], dtype=object)
In [588]: M.rows
Out[588]: array([[1, 3], [2]], dtype=object)
It would still require the 2 level iteration on your alist
structure.
In [593]: for i, rr in enumerate(alist):
...: for cc in rr:
...: m.rows[i].append(cc[0])
...: m.data[i].append(cc[1])
In [594]: m
Out[594]:
<2x4 sparse matrix of type '<class 'numpy.int32'>'
with 3 stored elements in LInked List format>
In [595]: m.A
Out[595]:
array([[ 0, 10, 0, -3],
[ 0, 0, 12, 0]])
In another comment you mentioned the difficulty in understanding the csr
indptr
. The easiest way to get that is to convert one these formats:
In [597]: Mr=M.tocsr()
In [598]: Mr.indptr
Out[598]: array([0, 2, 3], dtype=int32)
In [599]: Mr.data
Out[599]: array([10, -3, 12])
In [600]: Mr.indices
Out[600]: array([1, 3, 2], dtype=int32)