I want to do a regression in R using glm
, but is there a way to do it since I get the contrasts error.
mydf <- data.frame(Group=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12),
mod <- glm(formula = WL~New.Runner+Last.Run, family = binomial, data = mydf)
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels
Using the debug_contr_error
and debug_contr_error2
function defined here: How to debug “contrasts can be applied only to factors with 2 or more levels” error? we can easily locate the problem: only a single level is left in variable New.Runner
info <- debug_contr_error2(WL ~ New.Runner + Last.Run, mydf)
info[c(2, 3)]
# 1
#[1] "N"
## the data frame that is actually used by `glm`
dat <- info$mf
A factor of single level can not be applied contrasts to, since any kind of contrasts would reduce the number of levels by 1
. By 1 - 1 = 0
this variable would be dropped from the model matrix.
Well then, can we simply require that no contrasts be applied to a single-level factor? No. All contrasts methods forbid this:
contr.helmert(n = 1, contrasts = FALSE)
#Error in contr.helmert(n = 1, contrasts = FALSE) :
# not enough degrees of freedom to define contrasts
contr.poly(n = 1, contrasts = FALSE)
#Error in contr.poly(n = 1, contrasts = FALSE) :
# contrasts not defined for 0 degrees of freedom
contr.sum(n = 1, contrasts = FALSE)
#Error in contr.sum(n = 1, contrasts = FALSE) :
# not enough degrees of freedom to define contrasts
contr.treatment(n = 1, contrasts = FALSE)
#Error in contr.treatment(n = 1, contrasts = FALSE) :
# not enough degrees of freedom to define contrasts
contr.SAS(n = 1, contrasts = FALSE)
#Error in contr.treatment(n, base = if (is.numeric(n) && length(n) == 1L) n else length(n), :
# not enough degrees of freedom to define contrasts
Actually, if you think it carefully, you will conclude that without contrasts, a factor with a single level is just a dummy variable of all 1, i.e., the intercept. So, you can definitely do the following:
dat$New.Runner <- 1 ## set it to 1, as if no contrasts is applied
mod <- glm(formula = WL ~ New.Runner + Last.Run, family = binomial, data = dat)
#(Intercept) New.Runner Last.Run
# 1.4582 NA -0.2507
You get an NA
coefficient for New.Runner
due to rank-deficiency. In fact, applying contrasts is a fundamental way to avoid rank-deficiency. It is just that when a factor has only one level, application of contrasts becomes a paradox.
Let's also have a look at the model matrix:
# (Intercept) New.Runner Last.Run
#1 1 1 1
#2 1 1 5
#3 1 1 2
#4 1 1 6
#5 1 1 5
#6 1 1 4
#8 1 1 3
#9 1 1 7
#10 1 1 2
#11 1 1 4
#12 1 1 9
#13 1 1 8
#15 1 1 3
#16 1 1 5
#17 1 1 1
#19 1 1 6
#20 1 1 10
#21 1 1 7
#22 1 1 9
#23 1 1 2
The (intercept)
and New.Runner
have identical columns and only one of them can be estimated. If you want to estimate New.Runner
, drop the intercept:
glm(formula = WL ~ 0 + New.Runner + Last.Run, family = binomial, data = dat)
#New.Runner Last.Run
# 1.4582 -0.2507
Make sure you digest the rank-deficiency issue thoroughly. If you have more than one single-level factors and you replace all of them by 1, dropping a single intercept still results in rank-deficiency.
dat$foo.factor <- 1
glm(formula = WL ~ 0 + New.Runner + foo.factor + Last.Run, family = binomial, data = dat)
#New.Runner foo.factor Last.Run
# 1.4582 NA -0.2507