Search code examples
rtestingpurrrgroup

How to apply a statistical test to several columns of a dataframe in R


I want to apply this test, not only to column x1, as I do in this example, but to several columns of df. In this case x1 and x2.

I tried to put this code inside a function and using purrr::map but I can't do it right.

library(tidyverse)

df <- tibble(skul = c(rep('a',60), rep('b', 64)),
             x1 = sample(1:10, 124, replace = TRUE),
             x2 = sample(1:10, 124, replace = TRUE),
             i_f = c(rep(0, 30), rep(1, 30), rep(0, 32), rep(1, 32)))


lapply(split(df, factor(df$skul)),
       function(x)wilcox.test(data=x, x1 ~ i_f,
                              paired=FALSE))
#> Warning in wilcox.test.default(x = c(10L, 5L, 8L, 4L, 6L, 3L, 10L, 2L, 10L, :
#> cannot compute exact p-value with ties
#> Warning in wilcox.test.default(x = c(3L, 3L, 4L, 9L, 8L, 10L, 5L, 5L, 4L, :
#> cannot compute exact p-value with ties
#> $a
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  x1 by i_f
#> W = 546, p-value = 0.1554
#> alternative hypothesis: true location shift is not equal to 0
#> 
#> 
#> $b
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  x1 by i_f
#> W = 565, p-value = 0.4781
#> alternative hypothesis: true location shift is not equal to 0
Created on 2022-04-13 by the reprex package (v2.0.1)

Solution

  • One way is to loop over the columns of interest as a nested inner loop after the split, create the formula with reformulate and apply the wilcox.test

    out <- lapply(split(df, df$skul), function(x) 
        lapply(setNames(c("x1", "x2"), c("x1", "x2")), function(y)
          wilcox.test(reformulate("i_f", response = y), data = x)))
    

    -output

    > out$a
    $x1
    
        Wilcoxon rank sum test with continuity correction
    
    data:  x1 by i_f
    W = 452, p-value = 0.9822
    alternative hypothesis: true location shift is not equal to 0
    
    
    $x2
    
        Wilcoxon rank sum test with continuity correction
    
    data:  x2 by i_f
    W = 404.5, p-value = 0.5027
    alternative hypothesis: true location shift is not equal to 0
    

    If we want to use tidyverse

    library(dplyr)
    df %>% 
       group_by(skul) %>% 
       summarise(across(c(x1, x2), 
       ~list(broom::tidy(wilcox.test(reformulate("i_f", cur_column()))))))