Search code examples
rdplyrtidyversepurrr

additional arguments to purrr:map don't work as expected


I'm using the purrr::map function to iterate over several columns and tidy the result. for a short example, I provide the following code:

library(tidymodels)
library(broom)

> penguins %>% 
+   select(where(is.numeric)) %>% 
+   map(\(x) lm(x ~ penguins$species, .)) %>% 
+   map_df(broom::tidy, .id = "var")
# A tibble: 12 × 6
   var               term                       estimate std.error statistic   p.value
   <chr>             <chr>                         <dbl>     <dbl>     <dbl>     <dbl>
 1 bill_length_mm    (Intercept)                 38.8       0.241    161.    2.47e-322
 2 bill_length_mm    penguins$speciesChinstrap   10.0       0.432     23.2   4.23e- 72
 3 bill_length_mm    penguins$speciesGentoo       8.71      0.360     24.2   5.33e- 76
 4 bill_depth_mm     (Intercept)                 18.3       0.0912   201.    0        
 5 bill_depth_mm     penguins$speciesChinstrap    0.0742    0.164      0.453 6.50e-  1
 6 bill_depth_mm     penguins$speciesGentoo      -3.36      0.136    -24.7   7.93e- 78
 7 flipper_length_mm (Intercept)                190.        0.540    351.    0        
 8 flipper_length_mm penguins$speciesChinstrap    5.87      0.970      6.05  3.79e-  9
 9 flipper_length_mm penguins$speciesGentoo      27.2       0.807     33.8   1.84e-110
10 body_mass_g       (Intercept)               3701.       37.6       98.4   2.49e-251
11 body_mass_g       penguins$speciesChinstrap   32.4      67.5        0.480 6.31e-  1
12 body_mass_g       penguins$speciesGentoo    1375.       56.1       24.5   5.42e- 77

This works as expected.

However, usually when I map functions with additional arguments, I use an anonymous function as suggested in the doc. When I try it in this example, only changing the last line of the code from previous code, I get the tidy table with all regerssions results, but without the "var" column which tells me the variable included in the regression

> penguins %>% 
+   select(where(is.numeric)) %>% 
+   map(\(x) lm(x ~ penguins$species, .)) %>% 
+   map_df(\(x) broom::tidy(x, .id = "var"))
# A tibble: 12 × 5
   term                       estimate std.error statistic   p.value
   <chr>                         <dbl>     <dbl>     <dbl>     <dbl>
 1 (Intercept)                 38.8       0.241    161.    2.47e-322
 2 penguins$speciesChinstrap   10.0       0.432     23.2   4.23e- 72
 3 penguins$speciesGentoo       8.71      0.360     24.2   5.33e- 76
 4 (Intercept)                 18.3       0.0912   201.    0        
 5 penguins$speciesChinstrap    0.0742    0.164      0.453 6.50e-  1
 6 penguins$speciesGentoo      -3.36      0.136    -24.7   7.93e- 78
 7 (Intercept)                190.        0.540    351.    0        
 8 penguins$speciesChinstrap    5.87      0.970      6.05  3.79e-  9
 9 penguins$speciesGentoo      27.2       0.807     33.8   1.84e-110
10 (Intercept)               3701.       37.6       98.4   2.49e-251
11 penguins$speciesChinstrap   32.4      67.5        0.480 6.31e-  1
12 penguins$speciesGentoo    1375.       56.1       24.5   5.42e- 77
> penguins %>% 
+   select(where(is.numeric)) %>% 
+   map(\(x) lm(x ~ penguins$species, .)) %>% 
+   map_df(~ broom::tidy(.x, .id = "var"))
# A tibble: 12 × 5
   term                       estimate std.error statistic   p.value
   <chr>                         <dbl>     <dbl>     <dbl>     <dbl>
 1 (Intercept)                 38.8       0.241    161.    2.47e-322
 2 penguins$speciesChinstrap   10.0       0.432     23.2   4.23e- 72
 3 penguins$speciesGentoo       8.71      0.360     24.2   5.33e- 76
 4 (Intercept)                 18.3       0.0912   201.    0        
 5 penguins$speciesChinstrap    0.0742    0.164      0.453 6.50e-  1
 6 penguins$speciesGentoo      -3.36      0.136    -24.7   7.93e- 78
 7 (Intercept)                190.        0.540    351.    0        
 8 penguins$speciesChinstrap    5.87      0.970      6.05  3.79e-  9
 9 penguins$speciesGentoo      27.2       0.807     33.8   1.84e-110
10 (Intercept)               3701.       37.6       98.4   2.49e-251
11 penguins$speciesChinstrap   32.4      67.5        0.480 6.31e-  1
12 penguins$speciesGentoo    1375.       56.1       24.5   5.42e- 77

What is the reason for this behavior?


Solution

  • The problem is that .id = "var" is not an argument for broom::tidy, but for purrr::map_df(). Under the hood purrr::map_df() is like purrr::map(), returning a list. But then it calls dplyr::bind_rows(), creating a data frame. The .id argument is passed to that function. When you provide .id to bind_rows(), it turns the names of the list into a column with the column name provided in the .id argument. broom::tidy() discards the .id argument unless the tidying method has such an argument. This is why you are missing your column.