I'm trying to use the function cv.glmnet
to find the best lambda (using the RIDGE regression) in order to predict the class of belonging of some objects.
So the code that I have used is:
CVGLM<-cv.glmnet(x,y,nfolds=34,type.measure = "class",alpha=0,grouped = FALSE)
actually I'm not using a K-fold cross validation because my size dataset is too small, in fact I have only 34 rows. So, I'm using in nfolds
the number of my rows, to compute a Leave-one out CV.
Now, I have some questions:
1) First of all: Does cv.glmnet
function tune the Hyperpameter lambda or also test the "final model"?
2)One time got the best lambda, what have I to do? Have I to use predict
function?
If yes, which data I have to use if I use all data to find lambda since I have used LOO CV?
3)How can I calculate R^2 from cv.glmnet
function?
Here is an attempt to answer your questions:
1) cv.glmnet
tests the performance of each lambda by using the cross validation of your specification. Here is an example:
library(glmnet)
data(iris)
find best lambda for iris prediction:
CVGLM <- cv.glmnet(as.matrix(iris[,-5]),
iris[,5],
nfolds = nrow(iris),
type.measure = "class",
alpha = 0,
grouped = FALSE,
family = "multinomial")
the miss classification error of best lambda is in
CVGLM$cvm
#output
0.06
If you test this independently using LOOCV and best lambda:
z <- lapply(1:nrow(iris), function(x){
fit <- glmnet(as.matrix(iris[-x,-5]),
iris[-x,5],
alpha = 0,
lambda = CVGLM$lambda.min,
family="multinomial")
pred <- predict(fit, as.matrix(iris[x,-5]), type = "class")
return(data.frame(pred, true = iris[x,5]))
})
z <- do.call(rbind, z)
and check the error rate it is:
sum(z$pred != z$true)/150
#output
0.06
so it looks like there is no need to test the performance using the same method as in cv.glmnet since it will be the same.
2) when you have the optimal lambda you should fit a model on the whole data set using glmnet
function. What you do after with the model is entirely up to you. Most people train a model to predict something.
3) what is R^2 for a classification problem? If you could explain that then you could calculate it.
R^2 = Explained variation / Total variation
what is this in terms of classes?
Anyhow R^2 is not used for classification but rather AUC, deviance, accuracy, balanced accuracy, kappa, joudens J and so on - most of these are used for binary classification but some are available for multinomial.
I suggest this as further reading