I'm using the purrr::map function to iterate over several columns and tidy the result. for a short example, I provide the following code:
library(tidymodels)
library(broom)
> penguins %>%
+ select(where(is.numeric)) %>%
+ map(\(x) lm(x ~ penguins$species, .)) %>%
+ map_df(broom::tidy, .id = "var")
# A tibble: 12 × 6
var term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 bill_length_mm (Intercept) 38.8 0.241 161. 2.47e-322
2 bill_length_mm penguins$speciesChinstrap 10.0 0.432 23.2 4.23e- 72
3 bill_length_mm penguins$speciesGentoo 8.71 0.360 24.2 5.33e- 76
4 bill_depth_mm (Intercept) 18.3 0.0912 201. 0
5 bill_depth_mm penguins$speciesChinstrap 0.0742 0.164 0.453 6.50e- 1
6 bill_depth_mm penguins$speciesGentoo -3.36 0.136 -24.7 7.93e- 78
7 flipper_length_mm (Intercept) 190. 0.540 351. 0
8 flipper_length_mm penguins$speciesChinstrap 5.87 0.970 6.05 3.79e- 9
9 flipper_length_mm penguins$speciesGentoo 27.2 0.807 33.8 1.84e-110
10 body_mass_g (Intercept) 3701. 37.6 98.4 2.49e-251
11 body_mass_g penguins$speciesChinstrap 32.4 67.5 0.480 6.31e- 1
12 body_mass_g penguins$speciesGentoo 1375. 56.1 24.5 5.42e- 77
This works as expected.
However, usually when I map functions with additional arguments, I use an anonymous function as suggested in the doc. When I try it in this example, only changing the last line of the code from previous code, I get the tidy table with all regerssions results, but without the "var" column which tells me the variable included in the regression
> penguins %>%
+ select(where(is.numeric)) %>%
+ map(\(x) lm(x ~ penguins$species, .)) %>%
+ map_df(\(x) broom::tidy(x, .id = "var"))
# A tibble: 12 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 38.8 0.241 161. 2.47e-322
2 penguins$speciesChinstrap 10.0 0.432 23.2 4.23e- 72
3 penguins$speciesGentoo 8.71 0.360 24.2 5.33e- 76
4 (Intercept) 18.3 0.0912 201. 0
5 penguins$speciesChinstrap 0.0742 0.164 0.453 6.50e- 1
6 penguins$speciesGentoo -3.36 0.136 -24.7 7.93e- 78
7 (Intercept) 190. 0.540 351. 0
8 penguins$speciesChinstrap 5.87 0.970 6.05 3.79e- 9
9 penguins$speciesGentoo 27.2 0.807 33.8 1.84e-110
10 (Intercept) 3701. 37.6 98.4 2.49e-251
11 penguins$speciesChinstrap 32.4 67.5 0.480 6.31e- 1
12 penguins$speciesGentoo 1375. 56.1 24.5 5.42e- 77
> penguins %>%
+ select(where(is.numeric)) %>%
+ map(\(x) lm(x ~ penguins$species, .)) %>%
+ map_df(~ broom::tidy(.x, .id = "var"))
# A tibble: 12 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 38.8 0.241 161. 2.47e-322
2 penguins$speciesChinstrap 10.0 0.432 23.2 4.23e- 72
3 penguins$speciesGentoo 8.71 0.360 24.2 5.33e- 76
4 (Intercept) 18.3 0.0912 201. 0
5 penguins$speciesChinstrap 0.0742 0.164 0.453 6.50e- 1
6 penguins$speciesGentoo -3.36 0.136 -24.7 7.93e- 78
7 (Intercept) 190. 0.540 351. 0
8 penguins$speciesChinstrap 5.87 0.970 6.05 3.79e- 9
9 penguins$speciesGentoo 27.2 0.807 33.8 1.84e-110
10 (Intercept) 3701. 37.6 98.4 2.49e-251
11 penguins$speciesChinstrap 32.4 67.5 0.480 6.31e- 1
12 penguins$speciesGentoo 1375. 56.1 24.5 5.42e- 77
What is the reason for this behavior?
The problem is that .id = "var"
is not an argument for broom::tidy
, but for purrr::map_df()
. Under the hood purrr::map_df()
is like purrr::map()
, returning a list. But then it calls dplyr::bind_rows()
, creating a data frame. The .id
argument is passed to that function. When you provide .id
to bind_rows()
, it turns the names of the list into a column with the column name provided in the .id
argument. broom::tidy()
discards the .id
argument unless the tidying method has such an argument. This is why you are missing your column.