Search code examples
rdplyrtidyverse

(R, dplyr) select multiple columns starts with same string and summarise mean (90% CI) by group


I am new to tidyverse, conceptually I would like to calculate mean and 90% CI of all columns starts with "ab", grouped by "case". Tried many ways but none seem to work, my actual data has many columns so explicitly list them out is not an option.

test data below

library(tidyverse)

dat <- tibble(case= c("case1", "case1", "case2", "case2", "case3"), 
              abc = c(1, 2, 3, 1, 2), 
              abe = c(1, 3, 2, 3, 4), 
              bca = c(1, 6, 3, 8, 9))

below code is what i would like to do conceptually, but doesn't work, obviously

dat %>% group_by(`case`) %>% 
  summarise(mean=mean(select(starts_with("ab"))), 
            qt=quantile(select(starts_with("ab"), prob=c(0.05, 0.95))))

what I would like to get is something like below

case abc_mean abe_mean abc_lb abc_ub abe_lb abe_ub

  <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 case1      1.5      2.0   1.05   1.95   1.10   2.90
2 case2      2.0      2.5   1.10   2.90   2.05   2.95
3 case3      2.0      4.0   2.00   2.00   4.00   4.00

Solution

  • You were very close, just move that select before the summarise. We then use summarise_all, and specify the appropriate functions within funs.

    dat %>%
        group_by(case) %>%
        select(starts_with('ab')) %>%
        summarise_all(funs('mean' = mean, 'ub' = quantile(., .95), 'lb' = quantile(., .05)))
    
    # # A tibble: 3 x 7
    #    case abc_mean abe_mean abc_ub abe_ub abc_lb abe_lb
    #   <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
    # 1 case1      1.5      2.0   1.95   2.90   1.05   1.10
    # 2 case2      2.0      2.5   2.90   2.95   1.10   2.05
    # 3 case3      2.0      4.0   2.00   4.00   2.00   4.00
    

    We use summarise_all instead of summarise because we wish to perform the same operations on multiple columns. It requires far less typing to use summarise_all instead of a summarise call in which we specify each column and each operation separately.