According to the glmnet vignette, a foldid
can be set up by:
foldid=sample(1:10,size=length(y),replace=TRUE)
However, if you look at the number of observations in each of the folds:
> table(foldid)
foldid
1 2 3 4 5 6 7 8 9 10
10 12 8 7 12 12 8 7 14 10
The distribution is not very even. I am getting huge variation in the cvm
/lambda.min
each time I run cv.glmnet
with foldid
precomputed by the method above (on my own datasets; n<30), and want to try a foldid
with more even distribution of observations. Could somebody suggest a way (code) to do this?
Never mind. Found an answer in the glmnet manual.
(n = 100)
> foldid=sample(rep(seq(10),length=n))
> table(foldid)
foldid
1 2 3 4 5 6 7 8 9 10
10 10 10 10 10 10 10 10 10 10
All the folds have the same number of observations.