Search code examples
rapplypairwise.wilcox.test

How to use pairwise_wilcox_test with apply() function?


I am using pairwise_wilcox_test from rstatix package on my data.frame.

data1 #shortend the data.frame
   Firmicutes Proteobacteria Verrucomicrobiota cls
1     9916885       83115.37            0.0000   1
10    9923240       76759.73            0.0000   1
13    9897778      102222.14            0.0000   1
16    9887923      112077.44            0.0000   1
19    9832122      167423.55          454.1326   1
11    9717375      235007.98        47616.9546   2
14    9820485      150719.87        28794.7347   2
17    9805007       54276.39       140716.5721   2
2     9676859      320811.45         2329.3241   2
20    9636967      363032.82            0.0000   2
12    9581184      400989.93        17825.6204   3
15    9908333       87339.68         4327.6418   3
18    9624107      147003.76       228889.5762   3
21    9899086       67276.26        33638.1295   3
24    9827215      165133.37         7651.6540   3

When I apply it on a specific column, it works fine

WIL <- rstatix::pairwise_wilcox_test(Firmicutes ~ cls, data=data1,exact = TRUE, p.adjust.method="bonferron")

Output:

# A tibble: 3 × 9
  .y.        group1 group2    n1    n2 statistic     p p.adj p.adj.signif
* <chr>      <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <chr>       
1 Firmicutes 1      2         12    12        86 0.443 1     ns          
2 Firmicutes 1      3         12    12        71 0.977 1     ns          
3 Firmicutes 2      3         12    12        43 0.101 0.303 ns

Now I want to use apply() to parse the entire table as follows (the table is originally longer), but I have a problem with the apply() function

WIL <- apply(as.matrix(data1),2, function(x){rstatix::pairwise_wilcox_test(x ~ cls, data=data1,exact = TRUE, p.adjust.method="bonferron")})

Output:

ℹ In index: 1.
    ℹ With name: V1.
    Caused by error in `pull()`:
    ! Can't extract columns that don't exist.
    ✖ Column `x` doesn't exist.
    Run `rlang::last_trace()` to see where the error occurred.
    Called from: signal_abort(cnd, .file)

I understand that the column "x" is not present, but I thought that x is defined by fucntion(x).

Can somebody give me a hint what I m doing wrong.

I am fairly new to R and stackoverflow, so maybe there is an obvious solution for this I apologise in advance...

Thank you!


Solution

  • You can't use apply here, because the x is the actual vector of values from your data frame, not the name of the column that you wish to test. In any case, the variable x inside the formula x ~ cls does not get substituted (this is always the case with formulas in R), so the the function is literally looking for a column called x that doesn't exist.

    Instead, you can use the column names of interest, and turn each into a correct formula inside lapply. You can then simply bind the results together into a single data frame:

    do.call('rbind', 
            lapply(names(data1)[1:3], function(x) {
      f <- as.formula(paste(x, '~ cls'))
      rstatix::pairwise_wilcox_test(data = data1, formula = f,
                                    exact = TRUE, p.adjust.method = "bonferroni")
      }))
    #> # A tibble: 9 x 9
    #>   .y.               group1 group2    n1    n2 statistic     p p.adj p.adj.signif
    #>   <chr>             <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <chr>       
    #> 1 Firmicutes        1      2          5     5        25 0.008 0.024 *           
    #> 2 Firmicutes        1      3          5     5        19 0.222 0.666 ns          
    #> 3 Firmicutes        2      3          5     5        10 0.69  1     ns          
    #> 4 Proteobacteria    1      2          5     5         6 0.222 0.666 ns          
    #> 5 Proteobacteria    1      3          5     5        10 0.69  1     ns          
    #> 6 Proteobacteria    2      3          5     5        15 0.69  1     ns          
    #> 7 Verrucomicrobiota 1      2          5     5         3 0.045 0.135 ns          
    #> 8 Verrucomicrobiota 1      3          5     5         0 0.01  0.029 *           
    #> 9 Verrucomicrobiota 2      3          5     5        11 0.841 1     ns
    

    Created on 2023-09-11 with reprex v2.0.2


    Data from question in reproducible format

    data1 <- structure(list(Firmicutes = c(9916885L, 9923240L, 9897778L, 9887923L, 
    9832122L, 9717375L, 9820485L, 9805007L, 9676859L, 9636967L, 9581184L, 
    9908333L, 9624107L, 9899086L, 9827215L), Proteobacteria = c(83115.37, 
    76759.73, 102222.14, 112077.44, 167423.55, 235007.98, 150719.87, 
    54276.39, 320811.45, 363032.82, 400989.93, 87339.68, 147003.76, 
    67276.26, 165133.37), Verrucomicrobiota = c(0, 0, 0, 0, 454.1326, 
    47616.9546, 28794.7347, 140716.5721, 2329.3241, 0, 17825.6204, 
    4327.6418, 228889.5762, 33638.1295, 7651.654), cls = c(1L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L)), 
    class = "data.frame", row.names = c("1", 
    "10", "13", "16", "19", "11", "14", "17", "2", "20", "12", "15", 
    "18", "21", "24"))