I'm estimating a linear regression in first-differences straight out of the textbook:
I'm estimating the model in first-differences with the plm()
function from the plm
package and extracting residuals with the augment()
function from the broom
package. I'm getting an error message and suspect I may not be using the "fd"
option correctly and/or misusing augment()
. A similar attempt for model="pooling"
appears to work. Help appreciated!
library(AER)
data(Fatalities)
Fatalities$fatality <- Fatalities$fatal / Fatalities$pop * 10000
library(plm)
library(broom)
plm.pool <- plm(fatality ~ beertax, data=Fatalities, model="pooling")
tidy(plm.pool) # ok
augment(plm.pool) # ok
plm.fd <- plm(fatality ~ beertax, data=Fatalities,
index=c("state", "year"),
model="fd")
tidy(plm.fd) # looks ok
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -0.00314 0.0119 -0.263 0.792
2 beertax 0.0137 0.285 0.0480 0.962
augment(plm.fd) # not ok
Error in `$<-.data.frame`(`*tmp*`, ".resid", value = c(`2` = 0.219840293582125, :
replacement has 288 rows, data has 336
In addition: Warning message:
In get(.Generic)(e1, e2) :
longer object length is not a multiple of shorter object length
EDIT: A WORKAROUND
So I suspect the problem has something to do with the fact that the model returned by plm
and the residuals do not have the same number of rows:
length(row.names(plm.fd$model))
is 336
length(names(plm.fd$residuals))
is 288.
Can someone tell me if the following is the correct way to get residuals and fitted values from the first-difference estimation?
data.frame(".rownames" = row.names(plm.fd$model), plm.fd$model) %>%
left_join(data.frame(".rownames" = names(resid(plm.fd)),
".fitted" = fitted(plm.fd),
".resid" = resid(plm.fd)
)) -> Fatalities.augmented
head(Fatalities.augmented)
.rownames fatality beertax .fitted .resid
1 1 2.12836 1.539379 NA NA
2 2 2.34848 1.788991 0.0034166261 0.219840294
3 3 2.33643 1.714286 -0.0010225479 -0.007890716
4 4 2.19348 1.652542 -0.0008451287 -0.138968054
5 5 2.66914 1.609907 -0.0005835833 0.479380363
6 6 2.71859 1.560000 -0.0006831177 0.053269973
References:
Reference for Edit:
This is due to a misunderstanding or non-special casing the first-difference (FD) panel model in broom::augment_columns
: the function assumes the residuals of the FD model have the same length as the predicted values.
More concretly, this line: ret$.resid <- residuals0(x)
(https://github.com/tidymodels/broom/blob/069c21e903174fcf5d491091b7c347a9fdcd2999/R/utilities.R#L256)
FD models compress the data, so the number of residuals is lower than the number of observations used for model estimation. You can see that in the summary
output:
summary(panel3) # FD model
Oneway (individual) effect First-Difference Model
[...]
Balanced Panel: n = 90, T = 7, N = 630
Observations used in estimation: 540
[...]
While the model has an input of 630 observations, after FD transformation, only 540 transformed observations are used as one loses one observation per group (individual dimension) -> 630 - 90 = 540.
The broom:augment_columns
wants to put the predicted values (630) and the residuals (540) in the same data frame, this is bound to fail. If they want to do it, they could pad values with NA (e.g., the first row for each individual set to NA).
My suggestion is to make developers/the maintainer of broom aware of this issue (and maybe this post). plm's FD panel models are identified via plm_object$args$model == "fd"
.