Search code examples
rdplyrmagrittr

Applying multiple summarize-operations in one pipeline while "re-using" dropped variables


My problem seems to be simple enough but I can't wrap my head around the problem and obviously don't know the appropriate search terms.

I have a data frame, and I want to apply multiple summarizations. The data looks like this:

set.seed(123)

df <- data.frame(ID = c(1:20),
                 year = sample(2015:2020, 20, replace = T),
                 Age_group = sample(c("20-29", "30-39", "40-49", "50-59"), 20, replace = T),
                 sex = sample(c("male", "female"), 20, replace = T),
                 count = sample(1:100, 20, replace = T))

In a single pipeline, I would like to summarize the data

  1. by ID only, then
  2. by ID and year, then
  3. by ID, year, and age group …

I have looked into the assign function, the %T>% operator and {<<-.}, all of which helped me come a bit closer but no cigar, yet.

In my pseudo-R, it would look like this:

df %>%
  group_by(ID) %>%
  summarize(sum_of_count = sum(count)) %>% -> "df with ID and sum_of_count"
  group_by(ID,year) %>% # keep working with the variable "year" which hasn't been dropped
  summarize(sum_of_count = sum(count)) %>% -> "df with ID, year and sum_of_count"

I would like a separate data.frame for each grouping as I would eventually combine them in a single Excel file on different sheets using the rio::export() function.

I am fine with any solution (i. e., additional packages etc.) that is only one block of code.

Thank you all so much!


Solution

  • EDIT: Assuming that the summarize function is the same for all groups, one could try:

    list(
      df1 = "ID",
      df2 = c("ID","year")
    ) %>% lapply(function(gp){
      df %>% group_by_at(gp) %>% summarize(sum_of_count = sum(count))
    }) %>% rio::export("output_file.xlsx")
    

    here the df1 and df2 will determine the names of the Sheets in the exported outpuf_file.xlsx. Alternatively, the initial list can be left unnamed if the exported Sheet names do not matter.