Predicting chunks with M models in R

I have dataset (HEART). I split it into chunks. I would like to predict each chunk with his (M=3) previous models. In this case, I would like to predict chunk number 10 - with models 7,8,9. chunk 9 - with models 6,7,8... chunk 4 - with models 1,2,3. Here is my code:

library(caret)
dat1 <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"), header = FALSE,sep = ",")
colnames(dat1) <- c(LETTERS[1:(ncol (dat1)-1)],"CLA")
dat1$CLA<-as.factor (dat1$CLA)

chunk <- 30
n <- nrow(dat1)
r  <- rep(1:floor(n/chunk),each=chunk)[1:n]
d <- split(dat1,r)

N<-floor(n/chunk)
cart.models <- list()
for(i in 1:N){cart.models[[i]]<-rpart(CLA~ ., data = d[[i]]) }
for (i in (1+M):N) { k=0
  for (j in (i-M):(i-1)) { 
    k=k+1
    d[[i]][,(ncol(d[[i]])+k)]<-(predict(cart.models[[j]], d[[i]][,c(-14)], type = "class") )
    } 
     }

I get the following Error:

Error in `[<-.data.frame`(`*tmp*`, , (ncol(d[[i]]) + k), value = c(1L,  : 
  new columns would leave holes after existing columns

Solution

Your question is a bit puzzling, you load caret without using any functions from it. The objective seems like a time series analyses but instead of building on one chunk and predicting on the one that comes after it, you have a more complex desire, so createTimeSlices from caret won't do the trick. You could create custom folds in caret with index and indexOut arguments in trainControl but that would ultimately lead to the creation of more models (21 to be exact) than is required for the presented objective (9). So I do believe loops are an appropriate way:

create the models:

library(rpart)

N <- 9
cart.models <- list()
for(i in 1:N){
  cart.models[[i]] <- rpart(CLA~ ., data = d[[i]])
}

N can be 9 since 10 will not be utilized later on.

create a matrix to store the values:

cart.predictions <- matrix(nrow = chunk, ncol = length(4:10)*3)

it should have the same number of rows as there are predictions in each chunk (so 30) and it should have as many columns are there are predictions (three models for 4:10 chunks).

k <- 0 #as a counter
for (j in 4:10) { #prediction on chunks 4:10
  p <- j-3  
  pred <- list()
  for(i in p : (p+2)) { #using models (chink - 3) : (chunk - 1)
    k = k + 1 
    predi <- predict(cart.models[[i]], d[[j]], type = "class")
    cart.predictions[,k] <- predi
  }
}

this creates a numeric matrix for predictions. By default when R converts factors to numeric it gives them numbers: 1 to the first level, 2 to the second etc - so to get the levels (0:4) you can just:

cart.predictions <- as.data.frame(cart.predictions - 1)

to create the column names:

names <- expand.grid(3:1, 4:10)
names$Var1 <- with(names, Var2 - Var1) 

colnames(cart.predictions) <- make.names(paste0(names$Var1,"_", names$Var2))

lets check if it correct:

prediction from model 5 on chunk 6 converted to numeric

as.numeric(as.character(predict(cart.models[[5]], d[[6]], type = "class")))

should be equal to

cart.predictions[["X5_6"]] #that's how the names were designed

all.equal(as.numeric(as.character(predict(cart.models[[5]], d[[6]], type = "class"))),
          cart.predictions[["X5_6"]])
#output
TRUE

or you can create a character matrix in the first place:

cart.predictions <- matrix(data = NA_character_, nrow = chunk, ncol = length(4:10)*3)

k <- 0 #as a counter
for (j in 4:10) { 
  p <- j-3
  pred <- list()
  for(i in p : (p+2)) {
    k = k + 1 
    predi <- predict(cart.models[[i]], d[[j]], type = "class")
    cart.predictions[,k] <- predi
  }
} 

cart.predictions <- as.data.frame(cart.predictions)

This should be the preferred method if the classes are certain "names".