Using iris
as an example.
After grouping by Species
, I want to summarize Sepal.Length
by its mean
, then summarize all the remaining columns by last
; (without calling out the remaining columns individually.) Wanting the result
# A tibble: 3 x 5
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.01 3.3 1.4 0.2
2 versicolor 5.94 2.8 4.1 1.3
3 virginica 6.59 3 5.1 1.8
This runs without error:
library(tidyverse)
iris %>%
as_tibble %>%
group_by(Species) %>%
summarise_all(~last(.))
But this doesn't:
iris %>%
as_tibble %>%
group_by(Species) %>%
summarise_all(Sepal.Length = mean(Sepal.Length), ~ last(.))
I've tried using everything()
and working with summarise_at
and summarise_if
, but I haven't stumbled on the right syntax to do this.
Since summarise_at
and summarise_all
map the same function to selected variables, they can't be used here.
One way to perform different summarisation for different columns in automatic way is to create expression
using quoting-and-unquoting technique:
library(dplyr)
cols = names(iris)[2:4] # select remaining columns
col_syms = syms(cols) # create symbols from strings
summary_vars <- lapply(col_syms, function(col) {
expr(last(!!col)) # expression that should be evaluated in summarise
})
names(summary_vars) = cols # new column names (set old names)
iris %>%
group_by(Species) %>%
summarise(Sepal.Length = mean(Sepal.Length), !!!summary_vars) # open expressions
You can see what is going to be evaluated by wrapping dplyr's pipe into rlang::qq_show()