Search code examples
rcross-validationglmnet

How to precompute foldid with even observations per fold for glmnet


According to the glmnet vignette, a foldid can be set up by:

foldid=sample(1:10,size=length(y),replace=TRUE)

However, if you look at the number of observations in each of the folds:

> table(foldid)
foldid
 1  2  3  4  5  6  7  8  9 10 
10 12  8  7 12 12  8  7 14 10 

The distribution is not very even. I am getting huge variation in the cvm/lambda.min each time I run cv.glmnet with foldid precomputed by the method above (on my own datasets; n<30), and want to try a foldid with more even distribution of observations. Could somebody suggest a way (code) to do this?


Solution

  • Never mind. Found an answer in the glmnet manual.

    (n = 100)
    > foldid=sample(rep(seq(10),length=n))
    > table(foldid)
    foldid
     1  2  3  4  5  6  7  8  9 10 
    10 10 10 10 10 10 10 10 10 10 
    

    All the folds have the same number of observations.