I am trying to run a two-way fixed-effects panel regression using plm
in R. First, I randomly generate some data. Then I create time and firm indices (two-way indexing as usual in a panel dataset) and the explanatory variable of interest (zp.dummy
). Then I create a panel data frame. Then I try to fit a two-way fixed-effects panel regression via plm
:
library(plm)
set.seed(0); z=rnorm(40) # generate random data
ztime=rep(c(1:10),4) # time index
zp.dummy=as.numeric(ztime>5) # a dummy to distinguish first 5 from last 5 time periods
zfirm=rep(sequence(4), each=10) # firm index
zp.rete=pdata.frame(cbind(ztime,zfirm,zp.dummy,z),index=c("ztime","zfirm"))
# create panel data frame indexed by time and firm
colnames(zp.rete)[4]="zp.rete" # rename a column in the panel data frame
zm1p=plm(zp.rete~zp.dummy, data=zp.rete, index=c("ztime","zfirm"), model="within", effect="twoways")
# run the panel regression via `plm`
When running the last line, I get this error message:
> Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, :
empty model
Question: What am I doing wrong?
I think I can achieve the desired result via lm
:
zftime=as.factor(ztime) # turn time index into factor
zffirm=as.factor(zfirm) # turn firm index into factor
zm1 = lm(zp.rete$zp.rete~-1+zp.dummy+zffirm+zftime)
# two-way fixed effects regression via `lm`
How may I replicate the result from lm
by plm
?
Carefully look at the output of the model via lm
: You will notice, a factor's level is non-estimable (is NA
). That is because there is not enough information in the data.
# NA coefficient:
summary(zm1)
model.matrix(zm1) ## looks suspicious
plm::detect.lindep(model.matrix(zm1)) ## collinear columns
Now, why does plm output an error? It transforms the data first (two-way within transformation) and then runs a plain linear regression on the transformations result, for the right-hand side called the model matrix. We can also look at the model matrix (the data after transformation) and will notice, we end up with a zero-only column. Obviously, a model with one zero-only column is not estimable and, thus, plm errors rightfully.
library(plm)
set.seed(0); z <- rnorm(40) # generate random data
ztime <- rep(c(1:10),4) # time index
zp.dummy <- as.numeric(ztime>5) # a dummy to distinguish first 5 from last 5 time periods
zfirm <- rep(sequence(4), each=10) # firm index
zp.data <- pdata.frame(cbind(ztime, zfirm, zp.dummy, z),index=c("zfirm", "ztime"))
# create panel data frame indexed by time and firm
colnames(zp.data)[4] <- "zp.rete" # rename a column in the panel data frame
# create model frame
mf <- model.frame(zp.data, zp.rete ~ zp.dummy)
# create model matrix
mm <- model.matrix(mf, model = "within", effect = "twoways")
all(mm == 0) # TRUE