I want to do ordinal regression in R, so I want to use the polr
function from the MASS
package. First I create a model like this:
model <- polr(labels ~ var1 + var2, Hess = TRUE)
Now I want to use the model to predict new cases. I thought that would simply be:
pred <- predict(model, data = c(newVar1, newVar2))
However it seems that predict is somehow predicting on the training set, not the new data. When my training set is 2000 examples, and my new data is 700 examples. I still get 2000 predicted labels.
So my question is: how do I use polr
to make predictions on new data?
Sadly there is no documentation entry for predict.polr
, otherwise you can simply read that for how to use predict
correctly.
In R, only for few primitive model fitting functions like smooth.spline
, predict
expect a vector for newdata (this is reasonable as smooth.spline
handles univariate regression). Generally, predict
expects a data frame or list, whose names match variables specified in model formula or as shown in model frame (the "terms" attributes). If you fit a model:
labels ~ var1 + var2
then you should construct newdata
:
predict(model, newdata = data.frame(var1 = newVar1, var2 = newVar2))
or
predict(model, newdata = list(var1 = newVar1, var2 = newVar2))
Note, it is newdata
, not data
for predict
.
Since there is no documentation, it may be good if we look at:
args(MASS:::predict.polr)
#function (object, newdata, type = c("class", "probs"), ...)
and you can even check source code (not long):
MASS:::predict.polr
You will see in source code:
newdata <- as.data.frame(newdata)
m <- model.frame(Terms, newdata, na.action = function(x) x,
xlev = object$xlevels)
This explains why newdata
should be passed as a data frame, and why variable names must match what is in Terms
.
Here is a reproducible example:
library(MASS)
house.plr <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
## check model terms inside model frame
attr(terms(house.plr$model), "term.labels")
# [1] "Infl" "Type" "Cont"
When making prediction, these will not work:
## `data` ignored as no such argument
predict(house.plr, data = data.frame("Low", "Tower", "Low"))
## no_match in names
predict(house.plr, newdata = data.frame("Low", "Tower", "Low"))
This works:
predict(house.plr, newdata = data.frame(Infl = "Low", Type = "Tower", Cont = "Low"))
#[1] Low
#Levels: Low Medium High