python matlab scipy scikit-learn lasso-regression

Python LASSO maximum number of non-zero coefficients

I have a substantially large dataset which includes more than 100 coefficients and thousands of entries. Therefore, I would like to use the Lasso approach for model training.

I am currently looking into the sci-kit documentation for:

Although the implementation seems straight forward, I was unable to find an input argument which allows restricting the maximum number of non-zero coefficients, e.g. to 10.

To be more clear, in the MatLab implementation of Lasso, the parameter 'DFMax' allows for the above.

Is there such an option in any Python implementation?

Solution

Restricting directly the number of nonzero coefficients is an NP-hard problem, and this is one of the beauty of LASSO which asymptotically solves this NP-hard problem.

I don't know the implement of DFMax in Matlab, but my suggestion is do the following:

Use LassoCV to find the best alpha value.
If the number of nonzero coefficients is smaller than your limit, take this alpha value.
If the number of nonzero coefficients is larger than your limit, use Lasso and a list of increasing alphas with your LassoCV's alpha as the minimum value, and stop when the number of nonzero coefficients is equal or below your threshold.