Search code examples
rregressionpanel

Run a large amount of regressions and storing coefficients


I am trying to run a large amount of regressions in my panel data. In this simple example its an univariate regression, but in reality i'm trying to run a multivariate regression (mentioning this just case).

My data looks like this:

DATE year month ASSET_CODE RET MKTRET
2000-01-01 2000 1 1 -0.5 1
2000-01-01 2000 1 2 1 1
2000-01-02 2000 1 1 1 2
2000-01-02 2000 1 2 2 2
2000-01-03 2000 1 1 0.3 3
2000-01-03 2000 1 2 -0.2 3
2000-02-01 2000 2 1 -3 4
2000-02-01 2000 2 2 -1 4
2000-02-02 2000 2 1 2 5
2000-02-02 2000 2 2 1 5
2000-02-03 2000 2 1 2 6
2000-02-03 2000 2 2 2 6

I want to group by month and asset code and run a regression RET ~ MKTRET (in each month for each asset code) and store the coefficients for the slope and intercept, and nothing else. I want the end result to be formatted like this:

year month ASSET_CODE INTERCEPT SLOPE
2000 1 1
2000 1 2
2000 2 1
2000 2 2

I tried the following approach, which should have given me a dataframe with all the necessary information (and a lot more, including other statistical info) which I could have used to reshape to get the end result:

regressions <- df %>%
  group_by(month, year, ASSET_CODE) %>%
  do(regressions_fit = lm(RET ~ MKTRET, data = ., na.rm = TRUE))
regressions_coef <- tidy(regressions, regressions_fit) 

However, this approach did not work for me because I ran out of memory on my computer. I suspect it has to do with the fact that it stored ALL the information related to the regressions and not just the SLOPE and INTERCEPT which i am interested in. Is there a less resource-intensive way to do this?


Solution

  • flm in the collapse package is like lm but returns the coefficients only and is faster. An alternative would be to use coef(lm(...)) in place of flm.

    library(collapse)
    library(dplyr)
    
    df %>%
      group_by(year, month, ASSET_CODE) %>%
      summarize(as.data.frame(t(flm(RET ~ MKTRET, cur_group()))), .groups = "drop")
    ## # A tibble: 4 × 5
    ##    year month ASSET_CODE `(Intercept)` MKTRET
    ##   <int> <int>      <int>         <dbl>  <dbl>
    ## 1  2000     1          1        -0.533  0.4  
    ## 2  2000     1          2         2.13  -0.600
    ## 3  2000     2          1       -12.2    2.5  
    ## 4  2000     2          2        -6.83   1.50 
    

    Note

    The data in reproducible form

    df <- structure(list(DATE = c("2000-01-01", "2000-01-01", "2000-01-02", 
    "2000-01-02", "2000-01-03", "2000-01-03", "2000-02-01", "2000-02-01", 
    "2000-02-02", "2000-02-02", "2000-02-03", "2000-02-03"), year = c(2000L, 
    2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 
    2000L, 2000L), month = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L, 2L), ASSET_CODE = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L), RET = c(-0.5, 1, 1, 2, 0.3, -0.2, -3, -1, 2, 1, 
    2, 2), MKTRET = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 
    6L)), class = "data.frame", row.names = c(NA, -12L))