I am running a lag-distributed model ordinary least squares in which a set of units are all treated in the same year. I am including a 2 lags and 2 leads to see if there were any "anticipation" or "long-term" effects by generating the same treatment status in the previous and following years. However, when I run the model all of the standard errors are the same. For example:
set.seed(123)
data <- data.frame(y = rnorm(100), id = sort(rep(1:10,10)), years = rep(2000:2009,10))
tr <- ifelse(data$years == 2005 & data$id %in% c(2,3,4,5,6),1,0)
lead.2003 <- ifelse(data$year == 2003 & data$id %in% c(2,3,4,5,6), 1, 0)
lead.2004 <- ifelse(data$year == 2004 & data$id %in% c(2,3,4,5,6), 1, 0)
lag.2006 <- ifelse(data$year == 2006 & data$id %in% c(2,3,4,5,6), 1, 0)
lag.2007 <- ifelse(data$year == 2007 & data$id %in% c(2,3,4,5,6), 1, 0)
data <- cbind(data, tr, lead.2003, lead.2004, lag.2006 , lag.2007)
summary(lm(y ~ tr + lead.2003 + lead.2004 + lag.2006 + lag.2007, data = data))
Any ideas why this is? Thanks.
The variance of your tr/lead/lag variables is the same and they are orthogonal, so they end up splitting the mean square error evenly. The SE is calculated by sigma^2 * (X'X)^-1. Take a look at the inverse diagonal of the covariance matrix ( (X'X)^-1 in the formula ).
fit <- lm(y ~ tr + lead.2003 + lead.2004 + lag.2006 + lag.2007, data = data)
X <- as.matrix(cbind(1, data[,4:ncol(data)]))
diag(solve(t(X) %*% X))
# 1 tr lead.2003 lead.2004 lag.2006 lag.2007
# 0.01333333 0.21333333 0.21333333 0.21333333 0.21333333 0.21333333
so, when you multiply this by the mean square error (sigma^2 in the formula, we can get this from an anova), and take the square root to get the standard error
mse <- anova(fit)[[3]][6] # mean square error
sqrt(diag(mse * solve(t(X) %*% X))) # SE
# 1 tr lead.2003 lead.2004 lag.2006 lag.2007
# 0.1059099 0.4236397 0.4236397 0.4236397 0.4236397 0.4236397
you end up with the same standard errors for them all.
You would get different standard errors if the variables had different variances, for example, changing lead.2003
, and rerunning the same model (the variables are still orthogonal)
## changed 2:6 to 3:6
lead.2003 <- ifelse(data$year == 2003 & data$id %in% 3:6, 1, 0)
## ... rerun model ...
summary(fit)$coefficients[,2] # standard errors
# (Intercept) tr lead.2003 lead.2004 lag.2006 lag.2007
# 0.1048613 0.4220586 0.4689540 0.4220586 0.4220586 0.4220586
Or, if the variables weren't orthogonal the standard errors would be different.
## again changing lead.2003
lead.2003 <- sample(c(rep(1,5), rep(0,95)), 100)
## .. rerun ..
summary(fit)$coefficients[,2]
# (Intercept) tr lead.2003 lead.2004 lag.2006 lag.2007
# 0.1057283 0.4272960 0.4316341 0.4316341 0.4272960 0.4272960