Search code examples
pythonpandasscipysparse-matrix

Pandas sparse dataFrame to sparse matrix, without generating a dense matrix in memory


Is there a way to convert from a pandas.SparseDataFrame to scipy.sparse.csr_matrix, without generating a dense matrix in memory?

scipy.sparse.csr_matrix(df.values)

doesn't work as it generates a dense matrix which is cast to the csr_matrix.

Thanks in advance!


Solution

  • Pandas docs talks about an experimental conversion to scipy sparse, SparseSeries.to_coo:

    http://pandas-docs.github.io/pandas-docs-travis/sparse.html#interaction-with-scipy-sparse

    ================

    edit - this is a special function from a multiindex, not a data frame. See the other answers for that. Note the difference in dates.

    ============

    As of 0.20.0, there is a sdf.to_coo() and a multiindex ss.to_coo(). Since a sparse matrix is inherently 2d, it makes sense to require multiindex for the (effectively) 1d dataseries. While the dataframe can represent a table or 2d array.

    When I first responded to this question this sparse dataframe/series feature was experimental (june 2015).