Search code examples
pythonscipysparse-matrixrpy2glmnet

Running glmnet with rpy2 on sparse design matrix?


I have a python snippet which works just fine to run GLMNET on np.array X and y. However, when X is a column sparse matrix from scipy, the code fails as rpy2 is not able to convert X. Am I making an obvious mistake?

A MCVE is:

import numpy as np
from scipy import sparse
from rpy2 import robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects import numpy2ri
from rpy2.robjects import pandas2ri

if __name__ == "__main__":
    X = sparse.rand(5, 20, density=0.1)
    y = np.random.randn(5)
    numpy2ri.activate()
    pandas2ri.activate()

    utils = rpackages.importr('utils')
    utils.chooseCRANmirror(ind=1) 
    if not rpackages.isinstalled('glmnet'):
        utils.install_packages("glmnet")
    glmnet = rpackages.importr('glmnet')

    glmnet = robjects.r['glmnet']
    glmnet_fit = glmnet(X, y, intercept=False, standardize=False)

And when I run it I get a NotImplementedError:

Conversion 'py2ri' not defined for objects of type '<class 'scipy.sparse.csc.csc_matrix'>'

Could I provide X in a different way? I'd be surprised if rpy2 could not handle sparse matrices.


Solution

  • You can create a sparse matrix with rpy2 as follows:

    import numpy as np
    import rpy2.robjects as ro
    from rpy2.robjects.packages import importr
    from scipy import sparse
    
    X = sparse.rand(5, 20, density=0.1).tocoo()
    r_Matrix = importr("Matrix")
    r_Matrix.sparseMatrix(
        i=ro.IntVector(X.row + 1),
        j=ro.IntVector(X.col + 1),
        x=ro.FloatVector(X.data),
        dims=ro.IntVector(X.shape))