Search code examples
rmle

Why my glm logit estimation is very biased?


I am doing some simulation work. I first used logit to get the probability of treatment for each observation, then use rbniom() to generate the binary treatment variable.

With treatment variable observed, I used glm with logit link to estimate the parameter gamma. It should be 1, but multiple tries (even with sample number increased), it is still around 0.3. Where does the bias come from?

Code is attached

set.seed(99)
n = 10000
for (rv in c('X1','X2', 'Z1', 'Z2','e','u')){
  assign(rv, rnorm(n =n, mean = 0, sd =5))
  # check values
  # get(rv), eval(as.name/symbol(rv))
}
X = cbind(X1,X2)
Z = cbind(Z1,Z2)
gamma = c(1,1)
# treatment probability for each observation
p_treatment = 1/(1+exp(-(X%*%gamma+e)))
# track treated or not
treated = mapply(FUN = rbinom, prob = p_treatment, size = 1, n = 1)
beta = c(1,1)
y = 1 + X%*%beta+treated+u
fit_lgt = glm(treated ~ X, family = binomial(link = 'logit'))
summary(fit_lgt)

Solution

  • This isn't a programming questions as much as a question about understanding your model. I don't particularly like how you coded the simulation but that isn't what I'll address here.

    In a generalized linear model you don't add random noise before applying the link. The line that is throwing things off is:

    p_treatment = 1/(1+exp(-(X%*%gamma+e)))
    

    You shouldn't be adding extra error so you should change this to:

    p_treatment = 1/(1+exp(-(X%*%gamma)))