Search code examples
rpredictionadaboost

How to run predict.boosting for new data?


I am trying to use predict.boosting for new data in adabag package. I can't find a way to use it for data without labels (or any other function from that package).

I am trying:

pr <- predict.boosting(modelfit, test[,2:ncol(test)])

It gives:

Error in `[.data.frame`(newdata, , as.character(object$formula[[2]])) : 
  undefined columns selected

However, if I include labels:

pr <- predict.boosting(modelfit, test)

it works just fine. But there has to be a way to use it as a predictive model for data without labels.

Thanks for any help!

EDIT Example from package:

library(rusboost)
library(rpart)
data(iris)

make it an unbalanced dataset by removing most of the setosa observations

df <- iris[41:150,]

create binary variable

df$Setosa <- factor(ifelse(df$Species == "setosa", "setosa", "notsetosa"))

create index of negative examples

idx <- df$Setosa == "notsetosa"

run model

test.rusboost <- rusb(Setosa ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
                      data = df, boot = F, iters = 20, sampleFraction = .1, idx = idx)

predict.boosting(test.rusboost, df)
predict.boosting(test.rusboost, df[,1:4)

Solution

  • You should control that all your columns in train (the set you used to train the model) are present in test an with the same name.

    Please check:

    all(colnames(train) %in% colnames(test))
    

    If it's false, you will need to control how you built train and test.

    If it's TRUE, and in general, please provide a reproductible example.

    Edit:

    A nice way to control that columns are the same, and they contain the same factors is to use sameShape from dataPreparation package. If it's not the cas, it will add levels and columns (and warn you).

    To use it:

    library(dataPreparation)
    test <- sameShape(test, train)