Search code examples
rtidybroomfixest

Mapping broom::tidy to nested list of {fixest} models and keep name of list element


I want to apply broom::tidy() to models nested in a fixest_multi object and extract the names of each list level as data frame columns. Here's an example of what I mean.

library(fixest)
library(tidyverse)
library(broom)

multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)

This command estimates two models for each dep. var. (Ozone and Solar.R) for a subset of each Month plus the full sample. Here's how the resulting object looks like:

> names(multiple_est)
[1] "Full sample" "5"           "6"           "7"           "8"           "9" 
> names(multiple_est$`Full sample`)
[1] "Ozone"   "Solar.R"

I now want to tidy each model object, but keep the information of the Month / Dep.var. combination as columns in the tidied data frame. My desired output would look something like this:

I can run map_dfr from the tidyr package, giving me this result:


> map_dfr(multiple_est, tidy, .id ="Month") %>% head(9)
# A tibble: 9 x 6
  Month       term        estimate std.error statistic  p.value
  <chr>       <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 Full sample (Intercept)   -71.0     23.6      -3.01  3.20e- 3
2 Full sample Wind           -3.06     0.663    -4.61  1.08e- 5
3 Full sample Temp            1.84     0.250     7.36  3.15e-11
4 5           (Intercept)   -76.4     82.0      -0.931 3.53e- 1
5 5           Wind            2.21     2.31      0.958 3.40e- 1
6 5           Temp            3.07     0.878     3.50  6.15e- 4
7 6           (Intercept)   -70.6     46.8      -1.51  1.45e- 1
8 6           Wind           -1.34     1.13     -1.18  2.50e- 1
9 6           Temp            1.64     0.609     2.70  1.29e- 2

But this tidies only the first model of each Month, the model with the Ozone outcome.

My desired output would look something like this:

Month       outcome         term        estimate      more columns from tidy
              
Full sample Ozone           (Intercept)   -71.0   
Full sample Ozone           Wind          -3.06   
Full sample Ozone           Temp          1.84    
Full sample Solar.R         (Intercept)   some value  
Full sample Solar.R         Wind          some value  
Full sample Solar.R         Temp          some value  

... rows repeated for each month 5, 6, 7, 8, 9

How can I apply tidy to all models and add another column that indicates the outcome of the model (which is stored in the name of the model object)?


Solution

  • So, fixest_mult has a pretty strange setup as I delved deeper. As you noticed, mapping across it or using apply just accesses part of the data frames. In fact, it isn't just the data frames for "Ozone", but actually just the data frames for the first 6 data frames (those for c("Full sample", "5", "6").

    If you convert to a list, it access the data attribute, which is a sequential list of all 12 data frames, but dropping the relevant names you're looking for. So, as a workaround, could use pmap() and the names (found in the attributes of the object) to tidy() and then use mutate() for your desired columns.

    library(fixest)
    library(tidyverse)
    library(broom)
    
    multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)
    nms <- attr(multiple_est, "meta")$all_names
    
    pmap_dfr(
      list(
        data = as.list(multiple_est),
        month = rep(nms$sample, each = length(nms$lhs)),
        outcome = rep(nms$lhs, length(nms$sample))
      ),
      ~ tidy(..1) %>%
        mutate(
          Month = ..2,
          outcome = ..3,
          .before = 1
        )
    )
    #> # A tibble: 36 × 7
    #>    Month       outcome term        estimate std.error statistic  p.value
    #>    <chr>       <chr>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
    #>  1 Full sample Ozone   (Intercept)   -71.0     23.6      -3.01  3.20e- 3
    #>  2 Full sample Ozone   Wind           -3.06     0.663    -4.61  1.08e- 5
    #>  3 Full sample Ozone   Temp            1.84     0.250     7.36  3.15e-11
    #>  4 Full sample Solar.R (Intercept)   -76.4     82.0      -0.931 3.53e- 1
    #>  5 Full sample Solar.R Wind            2.21     2.31      0.958 3.40e- 1
    #>  6 Full sample Solar.R Temp            3.07     0.878     3.50  6.15e- 4
    #>  7 5           Ozone   (Intercept)   -70.6     46.8      -1.51  1.45e- 1
    #>  8 5           Ozone   Wind           -1.34     1.13     -1.18  2.50e- 1
    #>  9 5           Ozone   Temp            1.64     0.609     2.70  1.29e- 2
    #> 10 5           Solar.R (Intercept)  -284.     262.       -1.08  2.89e- 1
    #> # … with 26 more rows