Search code examples
rdplyracross

Using summarise, across, and quantile functions together


I am trying to use mtcars dataset to calculate summary statistics. Here is my code -

df <- as_tibble(mtcars)


df.sum2 <- df %>%
  select(mpg, cyl, vs, am, gear, carb) %>% 
  mutate(across(where(is.factor), as.numeric)) %>% 
  summarise(across(
    .cols = everything(), 
    .fns = list(
                Min = min, 
                Q25 = quantile (., 0.25), 
                Median = median, 
                Q75 = quantile (., 0.75), 
                Max = max,
                Mean = mean, 
                StdDev = sd,
                N = n()
                ), na.rm = T,
   .names = "{col}_{fn}"
                   )
            )

But I got the following error -

Error: Problem with summarise() input ..1. x Can't subset columns that don't exist. x Locations 65, 66, 69, 71, 76, etc. don't exist. i There are only 6 columns. i Input ..1 is across(...).

If I take out the Q25 = quantile (.,0.25) and Q75 = quantile (.,0.75) from the above code, it works fine. Actually, I can get the expected results using the following codes -

df.sum <- df %>%
  select(mpg, cyl, vs, am, gear, carb) %>% # select variables to summarise
  summarise_each(funs(Min = min, 
                      Q25 = quantile (., 0.25), 
                      Median = median, 
                      Q75 = quantile (., 0.75), 
                      Max = max,
                      Mean = mean, 
                      StdDev = sd,
                      N = n()))

But I want to use the across function with the summarise function. I do not want to use the summarise_each function.


Solution

  • You need to use an anonymous function or formula syntax while passing additional arguments. Try

    library(dplyr)
    
    df.sum2 <- df %>%
      select(mpg, cyl, vs, am, gear, carb) %>% 
      mutate(across(where(is.factor), as.numeric)) %>% 
      summarise(across(
        .cols = everything(), 
        .fns = list(
          Min = min, 
          Q25 = ~quantile(., 0.25), 
          Median = median, 
          Q75 = ~quantile(., 0.75), 
          Max = max,
          Mean = mean, 
          StdDev = sd,
          N = ~n()
        ),
        .names = "{col}_{fn}"
      )
      )