Search code examples
rcross-validation

Random samples from createFolds in R


I want to split my dataset into 30 folds. So I used createFolds function from caret package in R. I set.seed to have reproducible results.

Now, I want to have 20 different random samples. In other words, 20 different 30 folds. Thus, I need to change the set.seed 20 times.

Is there a way to make this easier.

    wdbcc=as.data.frame(scale(wdbc))
    set.seed(12345)
    k = 30
    folds <- createFolds(wdbcc$PE, k = k, list = TRUE, returnTrain = TRUE)

NOTE

wdbcc is my dataset, k is the number of the fold, PE is the dependent variable.

EDIT1

I will give a brief example of what I want as follows

First I will use the following set.seed

  wdbcc=as.data.frame(scale(wdbc))
    set.seed(12345)
    k = 30
    folds <- createFolds(wdbcc$PE, k = k, list = TRUE, returnTrain = TRUE)

Then, I will build the model on that folds split as

 lm = list()
        for (i in 1:k) {
          lm[[i]] = lm(PE~ ., data = wdbcc[folds[[i]],])
        }

Then, I will use the same idea but with changing of set.seed to (123456) instead of (12345), and build the model on that.

I need to do that 20 times with different set.seed. Each time to build the model on different set.seed.

EDIT2

Simply, If I have 30 folds I will build the linear regression on those 30 folds and thus I will have 30 model results. I need to have the same process but with different 30 folds (20 times and each one of the 20 I have different 30 folds) So, I will build the model each time of the 20 on 30 different folds.


Solution

  • folds <- replicate(20,createFolds(wdbcc$PE, k = k, list = FALSE, returnTrain = TRUE))
    

    if you do not mind having the folds in a vector (columns of the matrix) and not in a list.

    Edit: my code above already ensures seeded random numbers, since you will get the same folds (all 20 of them) every time you run the above code with a given seed. However, if you absolutely want to have a specific seed for every resample (which is suspicious), you can do the following

    wdbcc=as.data.frame(scale(wdbc))
    lmv = vector("list",20)
    mySeed=c(1,2,3,4,5,...,20) #vector with your pre-defined seeds
    
    for (i in 1:length(lmv)) {
        set.seed(mySeed[i])
        lmv2 = vector("list",30)
        folds <- createFolds(wdbcc$PE, k = 30, list = FALSE, returnTrain = TRUE)
        for (j in 1:length(lmv2)) {
            lmv2[[j]] = lm(PE~ ., data = wdbcc[folds!=j,])
        }
        lmv[[i]] = lmv2
    }