My problem seems to be simple enough but I can't wrap my head around the problem and obviously don't know the appropriate search terms.
I have a data frame, and I want to apply multiple summarizations. The data looks like this:
set.seed(123)
df <- data.frame(ID = c(1:20),
year = sample(2015:2020, 20, replace = T),
Age_group = sample(c("20-29", "30-39", "40-49", "50-59"), 20, replace = T),
sex = sample(c("male", "female"), 20, replace = T),
count = sample(1:100, 20, replace = T))
In a single pipeline, I would like to summarize the data
I have looked into the assign
function, the %T>%
operator and {<<-.}
, all of which helped me come a bit closer but no cigar, yet.
In my pseudo-R, it would look like this:
df %>%
group_by(ID) %>%
summarize(sum_of_count = sum(count)) %>% -> "df with ID and sum_of_count"
group_by(ID,year) %>% # keep working with the variable "year" which hasn't been dropped
summarize(sum_of_count = sum(count)) %>% -> "df with ID, year and sum_of_count"
I would like a separate data.frame for each grouping as I would eventually combine them in a single Excel file on different sheets using the rio::export()
function.
I am fine with any solution (i. e., additional packages etc.) that is only one block of code.
Thank you all so much!
EDIT: Assuming that the summarize function is the same for all groups, one could try:
list(
df1 = "ID",
df2 = c("ID","year")
) %>% lapply(function(gp){
df %>% group_by_at(gp) %>% summarize(sum_of_count = sum(count))
}) %>% rio::export("output_file.xlsx")
here the df1
and df2
will determine the names of the Sheets in the exported outpuf_file.xlsx
. Alternatively, the initial list
can be left unnamed if the exported Sheet names do not matter.