Search code examples
rdplyrmeansummarizena.rm

Using summarize across with multiple functions when there are missing values


If I want to get the mean and sum of all the numeric columns using the mtcars data set, I would use following codes:

  group_by(gear) %>% 
  summarise(across(where(is.numeric), list(mean = mean, sum = sum)))

But if I have missing values in some of the columns, how do I take that into account? Here is a reproducible example:

test.df1 <- data.frame("Year" = sample(2018:2020, 20, replace = TRUE), 
                       "Firm" = head(LETTERS, 5), 
                       "Exporter"= sample(c("Yes", "No"), 20, replace = TRUE), 
                       "Revenue" = sample(100:200, 20, replace = TRUE),
                         stringsAsFactors =  FALSE)

test.df1 <- rbind(test.df1, 
                    data.frame("Year" = c(2018, 2018),
                               "Firm" = c("Y", "Z"),
                               "Exporter" = c("Yes", "No"),
                               "Revenue" = c(NA, NA)))

test.df1 <- test.df1 %>% mutate(Profit = Revenue - sample(20:30, 22, replace = TRUE ))

test.df_summarized <- test.df1 %>% group_by(Firm) %>% summarize(across(where(is.numeric)), list(mean = mean, sum = sum)))

If I would just summarize each variable separately, I could use the following:

test.df1 %>% group_by(Firm) %>% summarize(Revenue_mean = mean(Revenue, na.rm = TRUE,
Profit_mean = mean(Profit, na.rm = TRUE)

But I am trying to figure out how can I tweak the code I wrote above for mtcars to the example data set I have provided here.


Solution

  • Because your functions all have a na.rm argument, you can pass it along with the ...

    test.df1 %>% summarize(across(where(is.numeric), list(mean = mean, sum = sum), na.rm = TRUE))
    #   Year_mean Year_sum Revenue_mean Revenue_sum Profit_mean Profit_sum
    # 1  2019.045    44419       162.35        3247      138.25       2765
    

    (I left out the group_by because it's not specified properly in your code and the example is still well-illustrated without it. Also make sure that your functions are inside across().)