Search code examples
pythonsvmsparse-matrixscikit-learn

How to speed up sklearn SVR?


I am implementing SVR using sklearn svr package in python. My sparse matrix is of size 146860 x 10202. I have divided it into various sub-matrices of size 2500 x 10202. For each sub matrix, SVR fitting is taking about 10 mins. What could be the ways to speed up the process? Please suggest any different approach or different python package for the same. Thanks!


Solution

  • You can average the SVR sub-models predictions.

    Alternatively you can try to fit a linear regression model on the output of kernel expansion computed with the Nystroem method.

    Or you can try other non-linear regression models such as ensemble of randomized trees or gradient boosted regression trees.

    Edit: I forgot to say: the kernel SVR model itself is not scalable as its complexity is more than quadratic hence there is no way to "speed it up".

    Edit 2: Actually, often scaling the input variables to [0, 1] or [-1, 1] or to unit variance using StandardScaler can speed up the convergence by quite a bit.

    Also it is very unlikely that the default parameters will yield good results: you have to grid search the optimal value for gamma and maybe also epsilon on a sub samples of increasing sizes (to check the stability of the optimal parameters) before fitting to large models.