Search code examples
rstatisticslmpoly

Why does R keep complaining that I'm giving the wrong type of matrix to my lm model when I'm not using matrices at all?


Specifically, I'm getting the error

Error: variable 'poly(x, degree = i, raw = TRUE)' was fitted with type "nmatrix.2" but type "nmatrix.5" was supplied

I was provided this starting code by my professor:

f = function(x) {
    return(x**2)
}

#Set a terribly boring seed for reproducibility
set.seed(1234)

#Make our training data
N = 100
x = sort(runif(N, 0, 1))
y = f(x) + rnorm(N, mean=0, sd=.3)

data_train = data.frame(x,y)

#Plot real quick
plot(x,y)

and the instructions for the problem say

fit four polynomials to the training data generated above: Fit polynomials of order 0, 1, 2, and 3 using the R function lm(), and name the resulting regression models fit0, fit1, fit2, and fit3 respectively. Additionally, plot the training data and the predictions for all four models (so you will have to make some predictions).

This is what I did so far:

fit0 = lm(formula = y ~ 1, data = data_train)

for(i in 1:5)
{
    assign(paste0("fit", i), lm(formula = y ~ poly(x, degree = i, raw = TRUE), 
    data = data_train))
}

plot(x,y)

abline(v = fit0$coef[1], col = 'blue')
abline(a = fit2$coef[1], b = fit2$coef[2], col = 'red')

y2 = predict(fit2, newdata = data.frame(x))

It works fine to create my fit variables and generate the scatterplot, but once I added the predict line (in order to get the values to plot for the quadratic model), it started giving me that weird error and I'm completely stumped as to why. I Googled the error message and I've been reading through the search results for hours, but the only example I found that seemed to actually match my problem was on a Japanese website, and Google translate clearly butchered the translation because the resulting grammar didn't make a lot of sense, so I couldn't really understand the posts.

I've also been reading and rereading my code to check that it matches the examples of how to plot polynomial regression equations in my course lecture notes and from my own previous examples of doing it that worked and I can't find anything meaningfully different here. I'm really confused because of the matrix thing in particular - I'm not using matrices at all,, so the only thing I can think of is that the messy and complicated structure of lm objects involves them somewhere? I just literally don't know what else to even try at this point, so any guidance would be greatly appreciated.


Solution

  • The problem lies with how the model is defined.

    'poly(x, degree = i, raw = TRUE)'
    

    In the model the degree term is defined to equal i. Since i is equal to 5 from exiting the last loop. The predict function is expecting a fifth order polynomial. "fit2" is only second order and thus the error.
    The work around is to redefine i before calling the predict function.

    plot(x,y)
    abline(h = fit0$coef[1], col = 'blue')
    
    i<-2
    y2 = predict(fit2, newdata = data.frame(x))
    lines(x, y2, col="red")
    
    i<-5
    y5 = predict(fit5, newdata = data.frame(x))
    lines(x, y5, col="green")
    

    There may be a way to redefine i inside the predict function, but I'm at a loss on how.

    Also note your call to abline should define a horizontal line and not vertical.

    enter image description here