Search code examples
numpymatrixscipysparse-matrix

NumPy: Importing a Sparse Matrix from R into Python


I have a matrix in R that is very large and sparse, created with the 'Matrix' package, and I want to handle in python + numpy. The R object is in the csc format, and if I export it using the function writeMM in the Matrix package, the output looks something like this:

%%MatrixMarket matrix coordinate real general
4589 17366 160441
22 1 5.954510725783322
36 1 29.77255362891661
41 1 23.81804290313329
74 1 5.954510725783322
116 1 59.54510725783322
127 1 11.909021451566645
159 1 17.863532177349967

Where the first column is the row, the second one the column, and the third one is the value.

I was wondering how could I import that into python. I see that scipy has a module to operate with column-compressed sparse matrices, but it has no function to create one from a file.


Solution

  • You can use scipy.io.mmread which does exactly what you want.

    In [11]: mmread("sparse_from_file")
    Out[11]: 
    <4589x17366 sparse matrix of type '<class 'numpy.float64'>'
        with 7 stored elements in COOrdinate format>
    

    Note the result is a COO sparse matrix. If you want a csc_matrix you can then use sparse.coo_matrix.tocsc.

    Now you mention you want to handle this very large and sparse matrix with numpy. That might turn out to be impractical since numpy operates on dense arrays only and if your matrix is indeed very large and sparse you probably can't afford to store it in dense format.

    So you could be better off sticking with the most efficient scipy.sparse format for your use case.