I am attempting to do classification prediction using glmnet, however I cannot deduce what the return object of "glmnet.predict" is supposed to represent. Using the code
mlogit_r<-glmnet(train_x, cbind(cns_label, renal_label,breast_label,nsclc_label,ovarian_label,leuk_label,colon_label, mela_label),
family="multinomial", alpha=0)
pred <- predict(mlogit_r, train_x, type="class")
with train_x being 57(n) x 6830(p), and the y object being 57(n) x 8 (num classes). The returned prediction object is a 57 x 100 matrix with labels. Which of these are the predicted labels?
It does not show in the documentation, as it just says
The object returned depends the . . . argument which is passed on to the predict method for glmnet objects.
When you fit a glmnet model without specifying the lambda value, by default a range containing 100 lambda values is fit. When you call predict on such a model without specifying the lambda, the predictions are made for all lambda hence you receive 100 different predictions from a 100 different models.
Usually one runs cross validation to choose one lambda that is best and then predicts using it:
library(glmnet)
data(iris)
lets use 120 rows for training:
z <- sample(1:nrow(iris), 120)
now run a 5 - fold cross validation using miss classification error to chose the best lambda:
cv_fit <- cv.glmnet(as.matrix(iris[z,-5]),
iris[z,5],
nfolds = 5,
type.measure = "class",
alpha = 0,
grouped = FALSE,
family = "multinomial")
plot(cv_fit)
Here you can see the lambda.min corresponding to the dashed line on the left (lambda with lowest error in 5 fold cross validation) and lambda.1se (lambda with error of 1 se withing the lowest error near it on slightly on the right.
These values are in:
cv_fit$lambda.min
#[1] 0.05560455
cv_fit$lambda.1se
#[1] 0.09717054
Now when you know the best lambda you can either build a model on 100 lambda values:
fit <- glmnet(as.matrix(iris[z,-5]),
iris[z, 5],
alpha = 0,
family = "multinomial")
and predict on a specific one:
predict(fit, as.matrix(iris[-z,-5]), s = cv_fit$lambda.min, type = "class")
or build a model on one lambda
fit1 <- glmnet(as.matrix(iris[z,-5]),
iris[z, 5],
alpha = 0,
lambda = cv_fit$lambda.min,
family = "multinomial")
and predict without specifying lambda:
all.equal(as.vector(predict(fit, as.matrix(iris[-z,-5]), s = cv_fit$lambda.min, type = "class")),
as.vector(predict(fit1, as.matrix(iris[-z,-5]), type = "class")))
#TRUE
To see how much the coefficients were constrained you can plot the model and the lambda used:
plot(fit, xvar = "lambda")
abline(v = log(cv_fit$lambda.min), lty = 2)