Search code examples
rlogistic-regressionmultinomial

Prediction using mboost multinomial logistic regression in R



I am trying to use the mboost package in R to apply a multinomial logistic regression model. I found this example online but I added the "newdata = iris" in the predict function to see how the prediction formula worked in mboost for new data. I am getting an error however. First here is the code:

library(mboost)

### fitting multinomial logit model via a linear array model
X0 <- K0 <- diag(nlevels(iris$Species) - 1)
colnames(X0) <- levels(iris$Species)[-nlevels(iris$Species)]
mlm <- mboost(Species ~ bols(Sepal.Length, df = 2) %O%
            buser(X0, K0, df = 2), data = iris,
          family = Multinomial())
round(predict(mlm, type = "response", newdata = iris), 2)

The error I'm getting is as follows:
Error in [.data.frame(newdata, nm) : undefined columns selected

I just re-used the iris data in the prediction just as a test but has anyone experienced this problem before?


Solution

  • The reason why you cannot use the predict function with new data is that you use pre-defined design and penalty matrices in buser(), i.e., X0 and K0. These are not part of the new data set and thus are not available when building new design matrices for prediction.

    Sarah Brockhaus postet a solution on github that replaces buser with bols. To do this one needs to convert the data set to a list and add the new dummy to this list. If one really wants to predict with new data, one needs to keep this dummy untouched. See also my post on github.

    [Edit] As @Lorcan-Treanor mentions in his comments, the number of factors needed for bols is not always equal to two. Here it actually is nlevels(iris$Species) - 1, i.e., one less factor level than we have classes in the outcome. I've also updated my post on github accordingly.