In the original Elastic net paper, Zou and Hastie, (2005) examined prostate cancer data for comparison purposes. I would like to regenerate the results using glmnet
package in R
. As mentioned in the paper, the response is lpsa
. The training and test sets are given by the variable train
in the data. I assumed alpha = 0.26
(as in the paper) and used cross validation to estimate lambda
. But I could not get a similar mean squared error to the one given in the paper (which is 0.381). Where is my mistake?
The code I used is given below.
library(ElemStatLearn)
library(glmnet)
x = model.matrix(lpsa ~ .-train, data = prostate)[, -1]
y = prostate$lpsa
#
trainlab = which(prostate$train=="TRUE")
testlab = which(prostate$train=="FALSE")
y.test = y[testlab]
alph=0.26
en.mod = glmnet(x[trainlab, ], y[trainlab], alpha = alph)
set.seed(1)
cv.out = cv.glmnet(x[trainlab, ], y[trainlab], alpha = alph)
bestlambda=cv.out$lambda.min
en.pred = predict(en.mod, s=bestlambda, newx = x[testlab, ])
MSE.en = mean((en.pred-y.test)^2)
MSE.en
[1] 0.5043356
According to the paper, they used an algorithm called LARS-EN, so you might be interested to check in the package called elasticnet
, as it implements that algorithm.