Search code examples
rlmr-factor

How can I better control the lm() estimate labels for an ordered factor?


I have data containing an ordered factor variable with many levels, such as this:

set.seed(1234)
y <- runif(100,0,100)
x <- rep_len(as.character(c(1991:2013)), length.out = 100)
df<-data.frame("x" = factor(x,ordered = TRUE), y)

When I use these data in lm(), R changes the names of estimated coefficients in a way that I think sacrifices clarity.

summary(lm(data = df, formula = y~x))

produces the following:

...Truncated...

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  43.8665     2.9456  14.892   <2e-16 ***
x.L           0.5284    13.9151   0.038    0.970    
x.Q         -17.6699    14.1375  -1.250    0.215    
x.C           0.3310    13.9882   0.024    0.981    
x^4          -0.8420    14.0647  -0.060    0.952    
x^5          20.1605    14.0629   1.434    0.156  

...Truncated...

In order to avoid this, I could "un-order" the factor but this is inconvenient to my circumstance. Is there a way to force R to use the actual level names? Also, what is the explanation of x.L, x.Q and x.C for the first three estimate names, only?


Solution

  • It appears from lack of answers that the best way to accomplish these goals is simply to "un-order" the factor variable, run the linear model and then "re-order" the factor variable, such as:

    df$x <- factor(df$x, ordered = FALSE)
    m <- lm(data = df, formula = y~x)
    df$x <- factor(df$x, ordered = TRUE)