Search code examples
rpredictionr-caret

Deepboost prediction of type = "prob" using caret package is not working


I was trying to fit deepboost model using caret package. I have downloaded the data from this link

library(caret)

prc <- read.csv("Prostate_Cancer.csv",stringsAsFactors = FALSE)

prc <- na.omit(prc[-1])  #removes the first variable(id) from the data set.

normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x))) }

prc_n <- as.data.frame(lapply(prc[2:9], normalize))
summary(prc_n)

df <- cbind(prc_n,diagnosis_result= prc$diagnosis_result)
head(df,2)

# create a list of 70% of the rows in the original dataset we can use for training
set.seed(123)
training <- sample(nrow(df), 0.7 * nrow(df))

dataTrain <- df[training,]
dataTest <- df[-training,]

trainControl <- trainControl(method="repeatedcv", number=10, repeats=5,
                             savePredictions=TRUE, classProbs=T)
#Deepboost
set.seed(7)
fit.dpb <- train(diagnosis_result~., data=dataTrain, method="deepboost", 
                             trControl=trainControl)
fit.dpb

dpb_cal_prob <- predict(fit.dpb, newdata = dataTrain, type = "prob")
dpb_val_prob <- predict(fit.dpb, newdata = dataTest, type = "prob")

dpb_cal <- predict(fit.dpb, newdata = dataTrain)
dpb_val <- predict(fit.dpb, newdata = dataTest)

#variable importance of variables
varImp(fit.dpb,scale=T)
plot(varImp(fit.dpb,scale=T))

Both varImp and predict(fit.dpb, newdata = dataTrain, type = "prob") is not working. Can anyone help me out?


Solution

  • Based on the source code for deepboost, the model can not predict probabilities and it does not have a varimp method. See differences in model list elements compared with glmnet source code for instance which can do these things.

    EDIT: To make the deepboost model predict probabilities you need to modify the source code:

    create a:

    deepboost_prob <- list(label = "DeepBoost",
                           library = "deepboost",....
    

    where you will copy the whole source code: https://github.com/topepo/caret/blob/master/models/files/deepboost.R

    Add a prob slot:

    deepboost_prob$prob <- function(modelFit, newdata, submodels = NULL) {
      if(!is.data.frame(newdata)) 
        newdata <- as.data.frame(newdata, stringsAsFactors = TRUE)
      probs <- deepboost:::predict(modelFit, newdata, type = "response")
      probs <- as.data.frame(probs, stringsAsFactors = FALSE)
      colnames(probs) <- modelFit@classes
      probs
    }
    

    check if it works:

    library(mlbench)
    library(caret)
    data(Sonar)
    
    trainControl <- trainControl(method = "cv",
                                 number = 5,
                                 savePredictions = TRUE,
                                 classProbs = TRUE)
    
    set.seed(7)
    fit.dpb <- train(x = Sonar[1:150,1:60],
                     y = Sonar$Class[1:150],
                     method = deepboost_prob, 
                      trControl = trainControl,
                     tuneLength = 1)
    Warning messages:
    1: In model.matrix.default(mt, mf, contrasts) :
      non-list contrasts argument ignored
    2: In model.matrix.default(mt, mf, contrasts) :
      non-list contrasts argument ignored
    3: In model.matrix.default(mt, mf, contrasts) :
      non-list contrasts argument ignored
    4: In model.matrix.default(mt, mf, contrasts) :
      non-list contrasts argument ignored
    5: In model.matrix.default(mt, mf, contrasts) :
      non-list contrasts argument ignored
    6: In model.matrix.default(mt, mf, contrasts) :
      non-list contrasts argument ignored
    

    I get warnings with the original deepboost implementation as well

    predict(fit.dpb, Sonar[200:208,1:60], type = "prob")
              M         R
    1 0.3721210 0.6278790
    2 0.4087576 0.5912424
    3 0.3700643 0.6299357
    4 0.3656457 0.6343543
    5 0.5232370 0.4767630
    6 0.2439648 0.7560352
    7 0.3687249 0.6312751
    8 0.2679716 0.7320284
    9 0.3292782 0.6707218