Search code examples
rdplyrsummarize

Summarizing one way, then another for what's left


Using iris as an example. After grouping by Species, I want to summarize Sepal.Length by its mean, then summarize all the remaining columns by last; (without calling out the remaining columns individually.) Wanting the result

# A tibble: 3 x 5
Species    Sepal.Length    Sepal.Width Petal.Length Petal.Width
<fct>             <dbl>          <dbl>        <dbl>       <dbl>
1 setosa           5.01            3.3          1.4         0.2
2 versicolor       5.94            2.8          4.1         1.3
3 virginica        6.59            3            5.1         1.8

This runs without error:

library(tidyverse)
iris %>% 
  as_tibble %>% 
  group_by(Species) %>% 
  summarise_all(~last(.))

But this doesn't:

iris %>% 
  as_tibble %>% 
  group_by(Species) %>% 
  summarise_all(Sepal.Length = mean(Sepal.Length), ~ last(.))

I've tried using everything() and working with summarise_at and summarise_if, but I haven't stumbled on the right syntax to do this.


Solution

  • Since summarise_at and summarise_all map the same function to selected variables, they can't be used here.

    One way to perform different summarisation for different columns in automatic way is to create expression using quoting-and-unquoting technique:

    library(dplyr)
    
    cols = names(iris)[2:4]  # select remaining columns 
    col_syms = syms(cols)  # create symbols from strings
    
    summary_vars <- lapply(col_syms, function(col) {
      expr(last(!!col))  # expression that should be evaluated in summarise
    })
    names(summary_vars) = cols  # new column names (set old names)
    
    iris %>%  
      group_by(Species) %>%
      summarise(Sepal.Length = mean(Sepal.Length), !!!summary_vars)  # open expressions
    

    You can see what is going to be evaluated by wrapping dplyr's pipe into rlang::qq_show()