We're doing a panel regression using the plm()
function of R package plm
and want add the fitted values as a new column to the dataset on which the regression was made.
MP_regression <- plm(operating_exp ~ HHI + rate + rate_lag1 + rate_lag2 +
HHI*rate + HHI*rate_lag1 + HHI*lag2,
data = market_power_merged, effect = "individual",
model = "within", index = c("firm", "date"))
When we use fitted(MP_regression)
as such:
fitted_values <- fitted(MP_regression)
then it produces fewer fitted values than the observations in the input data for the regression. So we want to add them back to the market_power_merged
dataframe by date and firm. Becase of the fewer fitted values (that the fitted()
function for some reason produces), it is important to match by both date and firm so we can see what observations were excluded in the fitted function, or alternatively remove those for which the fitted function does not produce a value.
In essence we want to:
market_power_merged <- mutate(fitted_values = fitted(MP_regression)
and match them by firm (individual) and date (time).
Apparently, the return of fitted()
carries an index attribute which is a data frame of the panel groups for fitted values. Therefore, consider cbind
on this index attribute to fitted values and then run left_join
or merge
(with all.x=TRUE
) on original data frame:
fitted_values_vec <- fitted(MP_regression)
fitted_values_df <- cbind(attr(fitted_values_vec, "index"),
fitted_values = fitted_values_vec)
Produc <- base::merge(Produc, fit_values, by=c("firm", "date"), all.x=TRUE)
# Produc <- dplyr::left_join(Produc, fit_values, by=c("firm", "date"))
To demonstrate with built-in plm
data frame, Produc:
data("Produc", package = "plm")
# ASSIGN RANDOM NAs ACROSS NON-PANEL COLUMNS
set.seed(41120)
for(col in names(Produc)[!names(Produc) %in% c("state", "year")]) {
Produc[sample(nrow(Produc), 50), col] <- NA
}
results <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"))
fitted_values_vec <- fitted(results)
str(fitted_values_vec)
# 'pseries' Named num [1:588] -0.2459 -0.2274 -0.0927 -0.0981 -0.0184 ...
# - attr(*, "names")= chr [1:588] "ALABAMA" "ALABAMA" "ALABAMA" "ALABAMA" ...
# - attr(*, "index")=Classes ‘pindex’ and 'data.frame': 588 obs. of 2 variables:
# ..$ state: Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
# ..$ year : Factor w/ 17 levels "1970","1971",..: 1 2 5 6 7 8 9 10 12 13 ...
fitted_values_df <- cbind(attr(fitted_values_vec, "index"),
fitted_values = fitted_values_vec)
Produc <- merge(Produc, fitted_values_df, by= c("state","year"), all.x=TRUE)
Output
head(Produc,10)
# state year region pcap hwy water util pc gsp emp unemp fitted_values
# 1 ALABAMA 1970 6 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5 4.7 -0.24591969
# 2 ALABAMA 1971 6 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9 5.2 -0.22735513
# 3 ALABAMA 1972 6 15972.41 7765.42 1764.75 6442.23 NA 31303 1072.3 NA NA
# 4 ALABAMA 1973 <NA> NA 7907.66 1742.41 6756.19 40084.01 33430 1135.5 3.9 NA
# 5 ALABAMA 1974 6 16762.67 8025.52 NA 7002.29 42057.31 33749 1169.8 5.5 -0.09272471
# 6 ALABAMA 1975 6 17316.26 8158.23 NA 7405.76 43971.71 33604 1155.4 7.7 -0.09806212
# 7 ALABAMA 1976 6 17732.86 NA 1799.74 7704.93 50221.57 35764 1207.0 6.8 -0.01841929
# 8 ALABAMA 1977 6 18111.93 8365.67 1845.11 7901.15 51084.99 37463 1269.2 7.4 0.02047675
# 9 ALABAMA 1978 6 18479.74 8510.64 1960.51 8008.59 52604.05 39964 1336.5 6.3 0.07225304
# 10 ALABAMA 1979 6 18881.49 8640.61 2081.91 8158.97 54525.86 40979 1362.0 7.1 0.09364171
tail(Produc,10)
# state year region pcap hwy water util pc gsp emp unemp fitted_values
# 807 WYOMING 1977 8 4037.03 2898.34 291.64 847.04 19977.67 9779 170.5 3.6 0.0871588
# 808 WYOMING 1978 8 4115.61 2920.85 294.73 900.04 20760.24 11038 187.4 NA NA
# 809 WYOMING 1979 8 4268.71 2950.53 313.47 1004.71 21643.50 11988 200.7 2.8 0.2346269
# 810 WYOMING 1980 8 NA 2979.23 338.06 1082.40 22628.22 13027 210.2 4.0 NA
# 811 WYOMING 1981 8 4572.67 3005.62 379.19 1187.86 26330.20 13717 223.5 4.1 0.3704301
# 812 WYOMING 1982 8 4731.98 3060.64 408.43 1262.90 27724.96 13056 217.7 5.8 0.3595080
# 813 WYOMING 1983 8 4950.82 3119.98 445.59 NA 28586.46 11922 NA 8.4 NA
# 814 WYOMING 1984 8 5184.73 3195.68 476.57 NA 28794.80 12073 204.3 6.3 0.3199823
# 815 WYOMING 1985 8 5448.38 3295.92 523.01 1629.45 29326.94 12022 NA 7.1 NA
# 816 WYOMING 1986 8 5700.41 3400.96 565.58 1733.88 27110.51 NA 196.3 9.0 NA