I need to introduce fixed effects (in this case: country dummies) into an otherwise simple glm()
in R.
The country fixed effects variables in my data look like this:
country country_a country_b country_c y x ...
1 1 0 0
1 1 0 0
2 0 1 1
2 0 1 1
Would this be the correct way of technically implementing it? See below...
glm(y ~ x + country_a + country_b + country_c, family=binomial(link="logit"))
And if so, how would I set a specific country as reference category? I know that I need to drop one country because of the fact that I would have perfect collinearity if I didn't. And normally this would then be my reference country. But what if other countries "go NA" as well simply due to the fact that they only appear a few times in the data and therefore disappear from the analysis (listwise deletion)? Will country_a
still be my reference category if I decide to drop it?
Or do I have to use the Country
variable (left column) in the first place and would have to tell glm()
somehow that this is a factor with no order? If so, how would I do that?
With data like:
> d
country y x
1 1 0.9610213 0.2586365
2 1 0.8561303 0.5972043
3 2 0.5463802 0.6412527
4 2 0.4703876 0.1126319
You can either convert to factor in the glm call:
> glm(y~factor(country),data=d)
Call: glm(formula = y ~ factor(country), data = d)
Coefficients:
(Intercept) factor(country)2
0.9086 -0.4002
Degrees of Freedom: 3 Total (i.e. Null); 2 Residual
Null Deviance: 0.1685
Residual Deviance: 0.008388 AIC: -7.317
Or make a new column that makes it explicit its not numeric:
> d$CountryCode = paste0("Country",d$country)
> d
country y x CountryCode
1 1 0.9610213 0.2586365 Country1
2 1 0.8561303 0.5972043 Country1
3 2 0.5463802 0.6412527 Country2
4 2 0.4703876 0.1126319 Country2
> glm(y~CountryCode,data=d)
Call: glm(formula = y ~ CountryCode, data = d)
Coefficients:
(Intercept) CountryCodeCountry2
0.9086 -0.4002
Degrees of Freedom: 3 Total (i.e. Null); 2 Residual
Null Deviance: 0.1685
Residual Deviance: 0.008388 AIC: -7.317
The missing factor level in the coefficient table is the baseline level - in this case Country1
.