I am trying to run a large amount of regressions in my panel data. In this simple example its an univariate regression, but in reality i'm trying to run a multivariate regression (mentioning this just case).
My data looks like this:
DATE | year | month | ASSET_CODE | RET | MKTRET |
---|---|---|---|---|---|
2000-01-01 | 2000 | 1 | 1 | -0.5 | 1 |
2000-01-01 | 2000 | 1 | 2 | 1 | 1 |
2000-01-02 | 2000 | 1 | 1 | 1 | 2 |
2000-01-02 | 2000 | 1 | 2 | 2 | 2 |
2000-01-03 | 2000 | 1 | 1 | 0.3 | 3 |
2000-01-03 | 2000 | 1 | 2 | -0.2 | 3 |
2000-02-01 | 2000 | 2 | 1 | -3 | 4 |
2000-02-01 | 2000 | 2 | 2 | -1 | 4 |
2000-02-02 | 2000 | 2 | 1 | 2 | 5 |
2000-02-02 | 2000 | 2 | 2 | 1 | 5 |
2000-02-03 | 2000 | 2 | 1 | 2 | 6 |
2000-02-03 | 2000 | 2 | 2 | 2 | 6 |
I want to group by month and asset code and run a regression RET ~ MKTRET (in each month for each asset code) and store the coefficients for the slope and intercept, and nothing else. I want the end result to be formatted like this:
year | month | ASSET_CODE | INTERCEPT | SLOPE |
---|---|---|---|---|
2000 | 1 | 1 | ||
2000 | 1 | 2 | ||
2000 | 2 | 1 | ||
2000 | 2 | 2 |
I tried the following approach, which should have given me a dataframe with all the necessary information (and a lot more, including other statistical info) which I could have used to reshape to get the end result:
regressions <- df %>%
group_by(month, year, ASSET_CODE) %>%
do(regressions_fit = lm(RET ~ MKTRET, data = ., na.rm = TRUE))
regressions_coef <- tidy(regressions, regressions_fit)
However, this approach did not work for me because I ran out of memory on my computer. I suspect it has to do with the fact that it stored ALL the information related to the regressions and not just the SLOPE and INTERCEPT which i am interested in. Is there a less resource-intensive way to do this?
flm
in the collapse package is like lm
but returns the coefficients only and is faster. An alternative would be to use coef(lm(...))
in place of flm
.
library(collapse)
library(dplyr)
df %>%
group_by(year, month, ASSET_CODE) %>%
summarize(as.data.frame(t(flm(RET ~ MKTRET, cur_group()))), .groups = "drop")
## # A tibble: 4 × 5
## year month ASSET_CODE `(Intercept)` MKTRET
## <int> <int> <int> <dbl> <dbl>
## 1 2000 1 1 -0.533 0.4
## 2 2000 1 2 2.13 -0.600
## 3 2000 2 1 -12.2 2.5
## 4 2000 2 2 -6.83 1.50
The data in reproducible form
df <- structure(list(DATE = c("2000-01-01", "2000-01-01", "2000-01-02",
"2000-01-02", "2000-01-03", "2000-01-03", "2000-02-01", "2000-02-01",
"2000-02-02", "2000-02-02", "2000-02-03", "2000-02-03"), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L,
2000L, 2000L), month = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L), ASSET_CODE = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L), RET = c(-0.5, 1, 1, 2, 0.3, -0.2, -3, -1, 2, 1,
2, 2), MKTRET = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L,
6L)), class = "data.frame", row.names = c(NA, -12L))