Search code examples
rggplot2lmpoly

Error when trying to produce a residual panel plot using a polynom in an lm


I think I've encountered a bug when plotting diagnostic plots for models made with lm() including a poly() term using ggResidPanel::resid_panel().

df <- data.frame(x=rnorm(100),
                 y=rnorm(100))

limo <- lm(y~poly(x, degree=2, raw=T), data=df)

summary(limo)

resid_panel(limo) ### this is where the error is

Error in [<-.data.frame`(`*tmp*`, , i, value = c("poly(x, degree = 2): 0.95", : 
  replacement has 200 rows, data has 100

I initially thought this could have something to do with the specifications of the poly function, e.g., the orthogonalization of the the linear and polynomial terms, but have exhausted all of the settings in the poly() function.

I do not know how the poly() function "names" the terms, but am now thinking that there is some bugginess, which is why the error thinks that there are 200 rows for x when there only should be 100.

Any pointers, alternatives to poly() or corrections to poly would be highly appreciated.


Solution

  • This error is due to the poly function returning a matrix. And when combined to the model, the matrix is considered to be one column. Take a look below:

    limo <- lm(y~poly(x, degree=2, raw=T), data=df)
    head(model.frame(limo)) # head(limo$model)
    
               y poly(x, degree = 2, raw = T).1 poly(x, degree = 2, raw = T).2
    1  2.0079704                     0.19350954                     0.03744594
    2  0.9932644                    -0.21109075                     0.04455930
    3  0.4524215                     0.45671674                     0.20859018
    4 -1.1717744                    -1.01049661                     1.02110340
    5 -1.3694232                    -1.12266529                     1.26037736
    6  1.2252210                    -1.27126749                     1.61612102
    

    While the above seems to be a dataframe of 3 columns, that is not the case. It has 2 columns.

    ncol(limo$model)
    [1] 2
    

    And the second column is a matrix with 2 columns:

    ncol(limo$model[[2]])
    [1] 2
    

    A matrix is just a vector with dimension attribute. its length is nrow*ncol

    This here is the issue. The moment the data is passed into the resid_panel function, you end up having the length of the matrix as your x hence not same length as y.

    Caviet:

    A small workaround is to change the model frame into a dataframe:

    limo$model <- do.call(cbind.data.frame, limo$model)
    summary(limo)
    resid_panel(limo)