If I want to get the mean and sum of all the numeric columns using the mtcars
data set, I would use following codes:
group_by(gear) %>%
summarise(across(where(is.numeric), list(mean = mean, sum = sum)))
But if I have missing values in some of the columns, how do I take that into account? Here is a reproducible example:
test.df1 <- data.frame("Year" = sample(2018:2020, 20, replace = TRUE),
"Firm" = head(LETTERS, 5),
"Exporter"= sample(c("Yes", "No"), 20, replace = TRUE),
"Revenue" = sample(100:200, 20, replace = TRUE),
stringsAsFactors = FALSE)
test.df1 <- rbind(test.df1,
data.frame("Year" = c(2018, 2018),
"Firm" = c("Y", "Z"),
"Exporter" = c("Yes", "No"),
"Revenue" = c(NA, NA)))
test.df1 <- test.df1 %>% mutate(Profit = Revenue - sample(20:30, 22, replace = TRUE ))
test.df_summarized <- test.df1 %>% group_by(Firm) %>% summarize(across(where(is.numeric)), list(mean = mean, sum = sum)))
If I would just summarize
each variable separately, I could use the following:
test.df1 %>% group_by(Firm) %>% summarize(Revenue_mean = mean(Revenue, na.rm = TRUE,
Profit_mean = mean(Profit, na.rm = TRUE)
But I am trying to figure out how can I tweak the code I wrote above for mtcars
to the example data set I have provided here.
Because your functions all have a na.rm
argument, you can pass it along with the ...
test.df1 %>% summarize(across(where(is.numeric), list(mean = mean, sum = sum), na.rm = TRUE))
# Year_mean Year_sum Revenue_mean Revenue_sum Profit_mean Profit_sum
# 1 2019.045 44419 162.35 3247 138.25 2765
(I left out the group_by
because it's not specified properly in your code and the example is still well-illustrated without it. Also make sure that your functions are inside across()
.)