Search code examples
rgroup-bymeanconfidence-interval

Multiple confidence interval by group


I am trying to calculate multiple means and 95% ci by group for 90+ columns:

sample data:

Group| A_pre  |    A_post |  B_pre |  B_post 

0       20          21        20        23
1       30          10        19        11
2       10          53        30        34
1       22          32        25        20
2       34          40        32        30
0       30          50        NA        40
0       39          40        19        20
1       40          NA        20        20
2       50          10        20        10
0       34          23        30        10
library(dplyr)
library(gmodels)
df <- df %>% 
  group_by(group) %>% 
  dplyr::summarize_all(list(~mean(., trim = 0), ~ci(.,)), na.rm=TRUE)

I get the error Error in UseMethod("ci") : no applicable method for 'ci' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"

I can get individual column ci using but that's time consuming for 90 columns:

library(rcompanion)
  groupwiseMean(x ~ group,
              data   = df,
              conf   = 0.95,
              digits = 3, na.rm = T)

Is there any way around the gmodels error or another way forward?


Solution

  • When you're using the %>% pipe, . refers to the thing being piped in, in this case your data frame. With the purrr-style lambdas, you should use .x for the argument.

    All that said, summarize_all() is deprecated. You might try with the replacement across():

    summarize(across(everything(),
      .fns = list(
        mean = ~mean(.x, na.rm = TRUE, trim = 0),
        ci = ~ ci(.x, na.rm = TRUE)
      )
    ))
    

    It also looks like you are passing na.rm = TRUE to summarize_all() to the ... argument, but that will not be passed through when you use lambdas with ~ notation, so I've tried to correct that.

    This is untested with no sample data provided, but I think it should work assume your data columns are all numeric.