Search code examples
rdataframemergedata.tabletidyverse

Combine list of dataframes into one dataframe and summarize in one step


I want to combine/reduce a list of dataframes into one dataframe, but I also want to summarize the data in one step. The output is from a simulation; therefore, each dataframe has the same output structure (i.e., a Group column, then 2 columns with values, which will have values that vary for each output).

Minimal Reproducible Example

df_list <- list(structure(list(Group = c("A", "B", "C"), Top_Group = c(1L, 
0L, 0L), Efficiency = c(0.464688158128411, 0.652386676520109, 
0.282913417555392)), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(Group = c("A", "B", "C"
), Top_Group = c(0L, 1L, 0L), Efficiency = c(0.120292583014816, 
0.0356206290889531, 0.37196880299598)), row.names = c(NA, -3L
), class = c("tbl_df", "tbl", "data.frame")), structure(list(
    Group = c("A", "B", "C"), Top_Group = c(0L, 1L, 0L), Efficiency = c(0.261322160949931, 
    0.383351784432307, 0.754808459430933)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame")))

What I Have Tried

I know I could bind the data together, then group and summarize.

library(tidyverse)

df_list %>% 
  bind_rows() %>%
  group_by(Group) %>%
  summarise(Top_Group = sum(Top_Group), Efficiency = max(Efficiency))

#  Group Top_Group Efficiency
#  <chr>     <int>      <dbl>
#1 A             1      0.465
#2 B             2      0.652
#3 C             0      0.755

I was hoping that there was someway to use something like reduce; however, I can only get it to work for pulling out one column (like Top_Group shown here), and am unsure how to use across all columns (if possible) and return a dataframe instead of vectors.

df_list %>%
  map(2) %>%
  reduce(`+`)

# [1] 1 2 0

Expected Output

  Group Top_Group Efficiency
  <chr>     <int>      <dbl>
1 A             1      0.465
2 B             2      0.652
3 C             0      0.755

Solution

  • Based on the OP's code, different functions were used on different columns. So, we may have to individually apply those elementwise functions

    library(purrr)
    reduce(df_list, ~ tibble(.x[1], .x[2] + .y[2], pmax(.x[3], .y[3])))
    

    -output

    # A tibble: 3 × 3
      Group Top_Group Efficiency
      <chr>     <int>      <dbl>
    1 A             1      0.465
    2 B             2      0.652
    3 C             0      0.755