Search code examples
rlmpredictpoly

Evaluating linear model with new data returns fitted values


I am constructing and evaluating my model as shown below.

yData <- rnorm(10)
xData <- matrix(rnorm(20), 10, 2)
polyModel <- lm(yData~polym(xData, degree=2, raw=T))
newData <- matrix(rnorm(100), 50, 2)
yPredicted <- predict(polyModel, polym(newData, degree=2, raw=T))

However, the model evaluation yPredicted just equals the fitted values polyModel$fitted.values, a vector of length 10. I was expecting yPredicted to be a vector of length 50 in this case. Some help would be much appreciated.


Solution

  • predict() doesn't work very well unless the data is specified in the data argument. This appears to work:

    polyModel <- lm(yData~poly(V1, V2, degree=2, raw=TRUE),
                    data=as.data.frame(xData))
    length(fitted(polyModel))  ## 10
    newData <- matrix(rnorm(100), 50, 2)
    yPredicted <- predict(polyModel, newdata=as.data.frame(newData))
    length(yPredicted) ## 50
    
    • V1 and V2 are the default column names assigned when you convert a matrix into a data frame.
    • this specification wouldn't work wello if you had an unknown and/or large number of columns to put into the polynomial (e.g. poly(V1, ..., V1000, degree=2, raw=TRUE))

    If you don't know the number of columns in advance, a slightly hacky solution would be:

    f <- as.formula(sprintf("yData~poly(%s, degree=2, raw=TRUE)",
               paste("V", seq(ncol(xData)), sep="", collapse=", "))
    polyModel <- lm(f, data=as.frame(xData))
    

    (untested)