Search code examples
scikit-learnsparse-matrixdata-sciencepolynomialssklearn-pandas

How to make polynomial features using sparse matrix in Scikit-learn


I am using Scikit-learn for converting my train data to polynomials features and then fit it to a linear model.

model = Pipeline([('poly', PolynomialFeatures(degree=3)),
              ('linear', LinearRegression(fit_intercept=False))])
model.fit(X, y)

But it throws an error

TypeError: A sparse matrix was passed, but dense data is required

I know my data is sparse matrix format. So when I try to convert my data to dense matrix it shows memory error. Because my data is huge(50k~). Because of these large amounts of data I can't convert it to a dense matrix.

I also find Github Issues where this feature is requested. But still not implemented.

So please can someone tell how to use sparse data format in PolynomialFeatures in Scikit-learn without converting it to dense format?


Solution

  • This is a new feature in the upcoming 0.20 version of sklearn. See Release History - V0.20 - Enhancements If you really wanted to test it out you could install the development version by following the instructions in Sklean - Advanced Installation - Install Bleeding Edge.