Search code examples
pythonscipysparse-matrix

What is the most efficient way to convert from a list of values to a scipy sparse matrix?


I have a list of values that I'm using a loop to convert to a scipy.sparse.dok_matrix. I'm aware of numpy.bincount but it doesn't work with sparse matrices. I'm wondering if there is a more efficient way to perform this conversion because the construction time for a dok_matrix is really long.

Example below for one row but I'm scaling to a 2D matrix by looping. The number of times a value x appears in the input list is the value of the xth element of the result matrix.

values = [1, 3, 3, 4]
expected_result = [0, 1, 0, 2, 1]

matrix = dok_matrix((1, MAXIMUM_EXPECTED_VALUE))
for value in values:
    matrix[0, value] = matrix.get((0, card)) + 1

MAXIMUM_EXPECTED_VALUE is in the order of 100000000 but len(values) < 100, which is why I'm using a sparse matrix. Possibly off-topic: there are also only a little over 10000 actual values that are used in the range of MAXIMUM_EXPECTED_VALUE but I think hashing to a contiguous range and converting back might be more complicated.


Solution

  • Looks like the standard coo style inputs suits you case:

    In [143]: from scipy import sparse
    In [144]: values = [1,3,3,4]
    In [145]: col = np.array(values)
    In [146]: row = np.zeros_like(col)
    In [147]: data = np.ones_like(col)
    In [148]: M = sparse.coo_matrix((data, (row,col)), shape=(1,10))
    In [149]: M
    Out[149]: 
    <1x10 sparse matrix of type '<class 'numpy.int64'>'
        with 4 stored elements in COOrdinate format>
    In [150]: M.A
    Out[150]: array([[0, 1, 0, 2, 1, 0, 0, 0, 0, 0]])