Search code examples
rdplyrtidyverseacross

How to rewrite the same code with across function


I scripted the following code

out %>% group_by(tests0, GROUP) %>% 
  summarise(
            mean0 = mean(score0, na.rm = T),
            stderr0 = std.error(score0, na.rm = T), 
            mean7 = mean(score7, na.rm = T), 
            stederr7 = std.error(score7, na.rm = T),
            diff.std.mean = t.test(score0, score7, paired = T)$estimate, 
            p.value = t.test(score0, score7, paired = T)$p.value, 
            )

and I have obtained the following output

 tests0     GROUP    mean0 stderr0 mean7 stederr7 diff.std.mean p.value
   <fct>      <fct>    <dbl>   <dbl> <dbl>    <dbl>         <dbl>   <dbl>
 1 ADAS_CogT0 CONTROL   12.6   0.525  13.6    0.662        -1.15  0.00182
 2 ADAS_CogT0 TRAINING  14.0   0.613  12.6    0.570         1.40  0.00295
 3 PVF_T0     CONTROL   32.1   1.22   31.3    1.45          0.498 0.636  
 4 PVF_T0     TRAINING  31.6   1.37   34.3    1.51         -2.48  0.0102 
 5 ROCF_CT0   CONTROL   29.6   0.893  30.3    0.821        -0.180 0.835  
 6 ROCF_CT0   TRAINING  30.1   0.906  29.5    0.929         0.489 0.615  
 7 ROCF_IT0   CONTROL   12.8   0.563  12.2    0.683         0.580 0.356  
 8 ROCF_IT0   TRAINING  10.9   0.735  12.3    0.768        -1.44  0.0238 
 9 ROCF_RT0   CONTROL   12.1   0.725  12.5    0.797        -0.370 0.598  
10 ROCF_RT0   TRAINING  10.5   0.746  10.9    0.742        -0.534 0.370  
11 SVF_T0     CONTROL   35.5   1.05   34      1.15          1.42  0.107  
12 SVF_T0     TRAINING  34.1   1.04   32.9    1.16          0.962 0.231

In case I would like to do the same via across function, What am i supposed to do to achieve the same results, shown into the code above? Actaully I am in trouble becase I was drawing some example from the answer published under this question Reproduce a complex table with double headesrs, but I was not able to suit it properly.

Here the dataset

Below you could find the way I would like to obtain the same. It ius a method requiring for .x manipulation.

out %>%    
group_by(across(all_of(tests0, GROUP))) %>%    summarise(across(starts_with('score'),                         
list(mean = ~ mean(.x,na.rm = T),            
stderr = ~ std.error(.x, na.rm = TRUE),            
diff.std.mean = ~ t.test(.x, na.rm = T)))$estimate,              
p.value = ~ t.test(.x, na.rm = T)))$p.value)),.groups = "drop")

Solution

  • I thought of a possible workaround (that may or may not help) by using across() "manually", without applying functions one column at a time. The resulting output is a data.frame with list columns that are deeply nested, so unnest() will come in handy. I also used possibly() to address the case when two columns are not present, remember that across() can match any number of columns and t.test() needs x and y arguments.

    Code:

    library(tidyverse)
    
    data <-
      df %>%
      group_by(tests0, GROUP) %>%
      summarize(
        all = list(across(starts_with("score")) %>%
          {
            tibble(
              ttest   = data.frame(possibly(~ reduce(., ~ t.test(.x, .y, paired = TRUE))[c("estimate", 'p.value')], NA)(.)),
              means   = data.frame(map(., ~ mean(.x, na.rm = TRUE)) %>% set_names(., str_replace(names(.), "\\D+", "mean"))),
              stderrs = data.frame(map(., ~ sd(.x, na.rm = TRUE)) %>% set_names(., str_replace(names(.), "\\D+", "stederr")))
            )
          })
      )
    #> `summarise()` has grouped output by 'tests0'. You can override using the `.groups` argument.
    
    
    data %>%
      unnest(all) %>%
      unnest(-c("tests0", "GROUP"))
    #> # A tibble: 2 × 8
    #> # Groups:   tests0 [1]
    #>   tests0     GROUP    estimate p.value mean0 mean7 stederr0 stederr7
    #>   <fct>      <fct>       <dbl>   <dbl> <dbl> <dbl>    <dbl>    <dbl>
    #> 1 ADAS_CogT0 CONTROL     -1.24 0.00471  12.5  13.5     3.72     4.81
    #> 2 ADAS_CogT0 TRAINING     1.40 0.00295  14.0  12.6     4.55     4.15
    

    Created on 2021-11-29 by the reprex package (v2.0.1)