Search code examples
rtidyverset-testbroom

Two sample t-test in multiple groups with multiple variables / metrics using tidyverse and broom package


Using the following data

df <- data.frame(category = sample(1:3, replace = TRUE, 50),
                testgroup = sample(c('A', 'B'), replace = TRUE, 50),
                var_1 = rnorm(50),
                var_2 = rnorm(50),
                var_3 = rnorm(50)
)

I would like to apply a 2-sample-t-test in each category comparing the difference in means between A and B with respect to all 3 variables.

Ideally, the output is generated using the tidyverse and broom packages.

I was struggling too long with the split-apply-combine-approach and I guess there is already a nice solution out there with a few lines of code.

Thanks a lot for your support!


Solution

  • The general rule of thumb is to get the arguments for the desired function (t.test in this case) in side-by-side columns. In your case, we aim to have A and B side by side:

    X <- df %>% group_by( category, testgroup ) %>%
        summarize( across(starts_with("var"), list) ) %>%
        ungroup() %>%
        pivot_longer( starts_with("var"), "variable", values_to="values" ) %>%
        pivot_wider( names_from="testgroup", values_from="values" )
    # # A tibble: 9 x 4
    #   category variable A          B
    #      <int> <chr>    <list>     <list>
    # 1        1 var_1    <dbl [3]>  <dbl [3]>
    # 2        1 var_2    <dbl [3]>  <dbl [3]>
    # 3        1 var_3    <dbl [3]>  <dbl [3]>
    # 4        2 var_1    <dbl [11]> <dbl [9]>
    # 5        2 var_2    <dbl [11]> <dbl [9]>
    # ...
    

    We are now well positioned to apply a two-sample t-test and process the results with broom:

    X %>% mutate(test   = map2(A, B, t.test),
                 result = map(test, broom::tidy) ) %>%
        unnest( result )
    # # A tibble: 9 x 15
    #    category variable A     B     test  estimate estimate1 estimate2 statistic
    #       <int> <chr>    <lis> <lis> <lis>    <dbl>     <dbl>     <dbl>     <dbl>
    #  1        1 var_1    <dbl… <dbl… <hte…    1.07    0.400    -0.665       1.08
    #  2        1 var_2    <dbl… <dbl… <hte…   -0.376   0.350     0.726      -0.415
    #  3        1 var_3    <dbl… <dbl… <hte…   -0.701  -0.102     0.599      -0.434
    #  4        2 var_1    <dbl… <dbl… <hte…   -0.276  -0.335    -0.0587     -0.531
    #  5        2 var_2    <dbl… <dbl… <hte…    0.727   0.689    -0.0374      1.74
    # ...