I am using Scikit-learn for converting my train data to polynomials features and then fit it to a linear model.
model = Pipeline([('poly', PolynomialFeatures(degree=3)),
('linear', LinearRegression(fit_intercept=False))])
model.fit(X, y)
But it throws an error
TypeError: A sparse matrix was passed, but dense data is required
I know my data is sparse matrix
format. So when I try to convert my data to dense matrix
it shows memory error
. Because my data is huge(50k~). Because of these large amounts of data I can't convert it to a dense matrix.
I also find Github Issues where this feature is requested. But still not implemented.
So please can someone tell how to use sparse data format in PolynomialFeatures in Scikit-learn without converting it to dense format?
This is a new feature in the upcoming 0.20 version of sklearn. See Release History - V0.20 - Enhancements If you really wanted to test it out you could install the development version by following the instructions in Sklean - Advanced Installation - Install Bleeding Edge.