Search code examples
rregressionlinear-regressionmontecarlo

Measurement error in the dependent variable of an OLS regression?


I am calculating a Monte-Carlo regression to analyse the effects of a measurement error in the dependent variable on an OLS estimation. The theory on this is clear. The estimation of the constant and the slope coefficient should be correct on average. However, my R code results in a biased constant but an unbiased slope coefficient. I suspect that I made a mistake when adding the measurement error to the data generating process?

b1=c()
b2=c()
x1 = rnorm(1000,mean=2,sd=1)
me=runif(1000,0,1)
graph<-data.frame(b1=b1,b2=b2)
for (i in 1:500){
  e = rnorm(1000,mean=0,sd=1)
  y=0.2+0.5*x1+e #true value
  y_o=y+me #observed value 
  c=lm(I(y_o)~I(x1)) 
  b1[i]=data.frame(c$coefficients)[1,1]
  b2[i]=data.frame(c$coefficients)[2,1]
}

plot5ab = ggplot(data=graph) +
  stat_density(aes(x=b1,fill="black"),colour="black",adjust=1.5, alpha=.05)+
  geom_histogram(aes(x=b1,y=..density..),fill = "black", alpha = 0.2,bins =50) +
  stat_density(aes(x=b2,fill="red"),adjust=1.5, alpha=.05,colour="black")+ 
  geom_histogram(aes(x=b2,y=..density..),fill = "red", alpha = 0.2,bins =50) +
  geom_vline(xintercept=0.2, linetype="dashed", color = "blue")+
  geom_vline(xintercept=0.5, linetype="dashed", color = "blue")+
  scale_fill_identity(name = NULL, 
                      labels = c(black = "Alpha", red = "Beta"
                      ),
                      guide = "legend")+
  theme_bw()+
  theme(legend.position = "bottom",
        axis.text.x     = element_text(angle = 90)) +
  labs(title = "Density of estimated parameters", subtitle = "R=500; Dashed blue lines = true parameters",
       y     = "Density",
       x     = "x")
plot5ab

Solution

  • You need to add error uniformly around y. Instead, what you are doing is adding on average 0.5 to y (i.e. me is centered around 0.5), and thus, the average value of your estimated intercept is the truth (0.2) plus this average, resulting in a distribution of estimated intercepts with mode around 0.2+0.5, or 0.7.

    Instead, I believe you want to define your measurement error to be centered around zero.

    Try:

    me = runif(1000, -.3, .3)
    

    Also, graph<-data.frame(b1=b1,b2=b2) should be added to your original post.