I am new to tidyverse, conceptually I would like to calculate mean and 90% CI of all columns starts with "ab", grouped by "case". Tried many ways but none seem to work, my actual data has many columns so explicitly list them out is not an option.
library(tidyverse)
dat <- tibble(case= c("case1", "case1", "case2", "case2", "case3"),
abc = c(1, 2, 3, 1, 2),
abe = c(1, 3, 2, 3, 4),
bca = c(1, 6, 3, 8, 9))
dat %>% group_by(`case`) %>%
summarise(mean=mean(select(starts_with("ab"))),
qt=quantile(select(starts_with("ab"), prob=c(0.05, 0.95))))
case abc_mean abe_mean abc_lb abc_ub abe_lb abe_ub
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 case1 1.5 2.0 1.05 1.95 1.10 2.90
2 case2 2.0 2.5 1.10 2.90 2.05 2.95
3 case3 2.0 4.0 2.00 2.00 4.00 4.00
You were very close, just move that select
before the summarise
. We then use summarise_all
, and specify the appropriate functions within funs
.
dat %>%
group_by(case) %>%
select(starts_with('ab')) %>%
summarise_all(funs('mean' = mean, 'ub' = quantile(., .95), 'lb' = quantile(., .05)))
# # A tibble: 3 x 7
# case abc_mean abe_mean abc_ub abe_ub abc_lb abe_lb
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 case1 1.5 2.0 1.95 2.90 1.05 1.10
# 2 case2 2.0 2.5 2.90 2.95 1.10 2.05
# 3 case3 2.0 4.0 2.00 4.00 2.00 4.00
We use summarise_all
instead of summarise
because we wish to perform the same operations on multiple columns. It requires far less typing to use summarise_all
instead of a summarise
call in which we specify each column and each operation separately.