I want to split my dataset into 30 folds. So I used createFolds
function from caret
package in R. I set.seed
to have reproducible results.
Now, I want to have 20 different random samples. In other words, 20 different 30 folds. Thus, I need to change the set.seed
20 times.
Is there a way to make this easier.
wdbcc=as.data.frame(scale(wdbc))
set.seed(12345)
k = 30
folds <- createFolds(wdbcc$PE, k = k, list = TRUE, returnTrain = TRUE)
NOTE
wdbcc
is my dataset, k
is the number of the fold, PE
is the dependent variable.
EDIT1
I will give a brief example of what I want as follows
First I will use the following set.seed
wdbcc=as.data.frame(scale(wdbc))
set.seed(12345)
k = 30
folds <- createFolds(wdbcc$PE, k = k, list = TRUE, returnTrain = TRUE)
Then, I will build the model on that folds split as
lm = list()
for (i in 1:k) {
lm[[i]] = lm(PE~ ., data = wdbcc[folds[[i]],])
}
Then, I will use the same idea but with changing of set.seed to (123456) instead of (12345), and build the model on that.
I need to do that 20 times with different set.seed. Each time to build the model on different set.seed.
EDIT2
Simply, If I have 30 folds I will build the linear regression on those 30 folds and thus I will have 30 model results. I need to have the same process but with different 30 folds (20 times and each one of the 20 I have different 30 folds) So, I will build the model each time of the 20 on 30 different folds.
folds <- replicate(20,createFolds(wdbcc$PE, k = k, list = FALSE, returnTrain = TRUE))
if you do not mind having the folds in a vector (columns of the matrix) and not in a list.
Edit: my code above already ensures seeded random numbers, since you will get the same folds (all 20 of them) every time you run the above code with a given seed. However, if you absolutely want to have a specific seed for every resample (which is suspicious), you can do the following
wdbcc=as.data.frame(scale(wdbc))
lmv = vector("list",20)
mySeed=c(1,2,3,4,5,...,20) #vector with your pre-defined seeds
for (i in 1:length(lmv)) {
set.seed(mySeed[i])
lmv2 = vector("list",30)
folds <- createFolds(wdbcc$PE, k = 30, list = FALSE, returnTrain = TRUE)
for (j in 1:length(lmv2)) {
lmv2[[j]] = lm(PE~ ., data = wdbcc[folds!=j,])
}
lmv[[i]] = lmv2
}