I have data containing an ordered factor variable with many levels, such as this:
set.seed(1234)
y <- runif(100,0,100)
x <- rep_len(as.character(c(1991:2013)), length.out = 100)
df<-data.frame("x" = factor(x,ordered = TRUE), y)
When I use these data in lm()
, R changes the names of estimated coefficients in a way that I think sacrifices clarity.
summary(lm(data = df, formula = y~x))
produces the following:
...Truncated...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.8665 2.9456 14.892 <2e-16 ***
x.L 0.5284 13.9151 0.038 0.970
x.Q -17.6699 14.1375 -1.250 0.215
x.C 0.3310 13.9882 0.024 0.981
x^4 -0.8420 14.0647 -0.060 0.952
x^5 20.1605 14.0629 1.434 0.156
...Truncated...
In order to avoid this, I could "un-order" the factor but this is inconvenient to my circumstance. Is there a way to force R to use the actual level names? Also, what is the explanation of x.L
, x.Q
and x.C
for the first three estimate names, only?
It appears from lack of answers that the best way to accomplish these goals is simply to "un-order" the factor variable, run the linear model and then "re-order" the factor variable, such as:
df$x <- factor(df$x, ordered = FALSE)
m <- lm(data = df, formula = y~x)
df$x <- factor(df$x, ordered = TRUE)