Search code examples
rlinear-regression

How do I make predictions using an ordered factor coefficient in R?


I'm currently trying to develop my understanding of ordered factors in R and using them as dependent variables in a linear model. I understand the outputs .L ,.Q and .C represent linear, quadratic and cubic but I'm wondering what is the "newx" that can be used the equations below to derive estimates for each level of my ordered factor.

I thought the "newx" was derived from the contr.poly() function but using this leads to a mismatch between my equation and the results derived from the predict() function. Can anyone help me understand what "newx" should be?

set.seed(101)
d <- data.frame(x=sample(1:4,size=30,replace=TRUE))
d$y <- rnorm(30,1+2*d$x,sd=0.01)
d$x = factor(d$x, labels=c("none", "some", "more", "a lot"))
Coefs <- coef(lm(y~ordered(x), d))

newx <- contr.poly(4)

predict(lm(y~ordered(x), d), newdata = data.frame(x = as.factor(c("none", "some", "more", "a lot"))))

Coefs[1]+(Coefs[2]*newx[1,1])+(Coefs[3]*newx[1,2]^2)+(Coefs[4]*newx[1,3]^3)
Coefs[1]+(Coefs[2]*newx[2,1])+(Coefs[3]*newx[2,2]^2)+(Coefs[4]*newx[2,3]^3)
Coefs[1]+(Coefs[2]*newx[3,1])+(Coefs[3]*newx[3,2]^2)+(Coefs[4]*newx[3,3]^3)
Coefs[1]+(Coefs[2]*newx[4,1])+(Coefs[3]*newx[4,2]^2)+(Coefs[4]*newx[4,3]^3)

Solution

  • Just give R a data frame with x values drawn from the levels of the factor ("none", "some", etc.), and it will do the rest.

    I changed your setup slightly to change the type of x to ordered() within the data frame (this will carry through all of the computations).

    d$x = ordered(d$x, labels=c("none", "some", "more", "a lot"))                                                                                                                                                                               
    m1 <- lm(y~x, d)      ## save fitted object                                                                                                     
    Coefs <- coef(m1)   
    

    Now we can predict():

    predict(m1, newdata =  data.frame(x=c("none","more"))) 
    ##         1        2
    ##  2.993959 6.997342       
    

    (didn't have to explicitly say that the new x was ordered())

    If you want to dig a little bit deeper into the computations, you can look at the model matrix:

    model.matrix(~unique(d$x))    
    

    For each level of the factor, these are the values R multiplies the coefficients by to generate the prediction (e.g. for level = "none", 1*b0 + (-0.67)*b1 + 0.5*b2 - 0.223*b3)

       (Intercept) unique(d$x).L unique(d$x).Q unique(d$x).C                                                                   
     1           1    -0.6708204           0.5    -0.2236068                                                                   
     2           1    -0.2236068          -0.5     0.6708204                                                                   
     3           1     0.2236068          -0.5    -0.6708204                                                                   
     4           1     0.6708204           0.5     0.2236068
    

    For even more detail, look at ?poly or the source code of poly() (although neither of these is easy!)