I am aware of the standard process of finding the optimal value of alpha/lambda using Cross Validation technique through GridSearchCV
class in sklearn.model_selection
library.Here's my code to find that .
alphas=np.arange(0.0001,0.01,0.0005)
cv=RepeatedKFold(n_splits=10,n_repeats=3, random_state=100)
hyper_param = {'alpha':alphas}
model = Lasso()
model_cv = GridSearchCV(estimator = model,
param_grid=hyper_param,
scoring='r2',
cv=cv,
verbose=1,
return_train_score=True
)
model_cv.fit(X_train,y_train)
#checking the bestscore
model_cv.best_params_
This gives me alpha=0.01
Now, looking on LassoCV
, as per my understanding , this library creates model by selecting best optimal alpha
by the passed alphas
list, and please note , I have used the same cross validation scheme for both of them. But when trying sklearn.linear_model.LassoCV
with RepeatedKFold cross validation scheme.
alphas=np.arange(0.0001,0.01,0.0005)
cv=RepeatedKFold(n_splits=10,n_repeats=3,random_state=100)
ls_cv_m=LassoCV(alphas,cv=cv,n_jobs=1,verbose=True,random_state=100)
ls_cv_m.fit(X_train_reduced,y_train)
print('Alpha Value %d'%ls_cv_m.alpha_)
print('The coefficients are {}',ls_cv_m.coef_)
I get alpha=0
for the same data and this alpha value in not present in the list of decimal values passed in alphas
argument for this.
This has confused me about the actual implementation of LassoCV
.
and my doubts are ..
0
in LassoCV
when the list passed to the argument does not has zero
in it.LassoCV
and Lasso
then, if I have to anyways find most suitable alpha from GridSearchCV
only?First you should pass your alphas as keywords parameters rather then positional parameters since the first positional parameter for LassoCV is eps.
ls_cv_m=LassoCV(alphas=alphas,cv=cv,n_jobs=1,verbose=True,random_state=100)
Then, the model is returning as optimal parameter one of the alphas that you previously defined, however you are simply printing it as an integer number casting the float to int. Replace %d with %f to print it in the float format:
print('Alpha Value %f'%ls_cv_m.alpha_)
Have a look here for more details about Python printing formats and styles.
As for your second question, Lasso is the linear model while LassoCV is an iterative process that allows you to find the optimal parameters for a Lasso model using Cross-validation.