Search code examples
python-3.xscikit-learnlasso-regressiongridsearchcv

GridSearchCV gives different results than LassoCV for optimal alpha


I am aware of the standard process of finding the optimal value of alpha/lambda using Cross Validation technique through GridSearchCV class in sklearn.model_selection library.Here's my code to find that .

    alphas=np.arange(0.0001,0.01,0.0005)
    cv=RepeatedKFold(n_splits=10,n_repeats=3, random_state=100)

    hyper_param = {'alpha':alphas}

    model = Lasso()

    model_cv = GridSearchCV(estimator = model,
                        param_grid=hyper_param,
                        scoring='r2',
                        cv=cv,
                        verbose=1,
                        return_train_score=True
                       )

   model_cv.fit(X_train,y_train)
   #checking the bestscore
   model_cv.best_params_

This gives me alpha=0.01

Now, looking on LassoCV , as per my understanding , this library creates model by selecting best optimal alpha by the passed alphas list, and please note , I have used the same cross validation scheme for both of them. But when trying sklearn.linear_model.LassoCV with RepeatedKFold cross validation scheme.

alphas=np.arange(0.0001,0.01,0.0005)
cv=RepeatedKFold(n_splits=10,n_repeats=3,random_state=100)
ls_cv_m=LassoCV(alphas,cv=cv,n_jobs=1,verbose=True,random_state=100)
ls_cv_m.fit(X_train_reduced,y_train)
print('Alpha Value %d'%ls_cv_m.alpha_)
print('The coefficients are {}',ls_cv_m.coef_)

I get alpha=0 for the same data and this alpha value in not present in the list of decimal values passed in alphas argument for this. This has confused me about the actual implementation of LassoCV. and my doubts are ..

  • Why do I get optimal alpha as 0 in LassoCV when the list passed to the argument does not has zero in it.
  • What is the difference between LassoCV and Lasso then, if I have to anyways find most suitable alpha from GridSearchCV only?

Solution

  • First you should pass your alphas as keywords parameters rather then positional parameters since the first positional parameter for LassoCV is eps.

    ls_cv_m=LassoCV(alphas=alphas,cv=cv,n_jobs=1,verbose=True,random_state=100)
    

    Then, the model is returning as optimal parameter one of the alphas that you previously defined, however you are simply printing it as an integer number casting the float to int. Replace %d with %f to print it in the float format:

    print('Alpha Value %f'%ls_cv_m.alpha_)
    

    Have a look here for more details about Python printing formats and styles.

    As for your second question, Lasso is the linear model while LassoCV is an iterative process that allows you to find the optimal parameters for a Lasso model using Cross-validation.