Search code examples
rdunn.test

Am I following the correct procedures with the dunn.test function?


I tested differences among sampling sites in terms of abundance values using kruskal.test. However, I want to determine the multiple differences between sites.

The dunn.test function has the option to use a vector data with a categorical vector or use the formula expression as lm.

I write the function in the way to use in a data frame with many columns, but I have not found an example that confirms my procedures.

library(dunn.test)

df<-data.frame(a=runif(5,1,20),b=runif(5,1,20), c=runif(5,1,20))

kruskal.test(df)

dunn.test(df)

My results were:

Kruskal-Wallis chi-squared = 6.02, df = 2, p-value = 0.04929  

Kruskal-Wallis chi-squared = 6.02, df = 2, p-value = 0.05  

      Comparison of df by group                           

      Between 1 and 2   2.050609, 0.0202
      Between 1 and 3  -0.141421, 0.4438
      Between 2 and 3  -2.192031, 0.0142

Solution

  • I took a look at your code, and you are close. One issue is that you should be specifying a method to correct for multiple comparisons, using the method argument.

    Correcting for Multiple Comparisons

    For your example data, I'll use the Benjamini-Yekutieli variant of the False Discovery Rate (FDR). The reasons why I think this is a good performer for your data are beyond the scope of StackOverflow, but you can read more about it and other correction methods here. I also suggest you read the associated papers; most of them are open-access.

    library(dunn.test)
    
    set.seed(711) # set pseudorandom seed
    
    df <- data.frame(a = runif(5,1,20),
                     b = runif(5,1,20), 
                     c = runif(5,1,20))
    
    dunn.test(df, method = "by") # correct for multiple comparisons using "B-Y" procedure
    
    # Output
    data: df and group
    Kruskal-Wallis chi-squared = 3.62, df = 2, p-value = 0.16
    
    
                               Comparison of df by group                           
                                 (Benjamini-Yekuteili)                             
    Col Mean-|
    Row Mean |          1          2
    ---------+----------------------
           2 |   0.494974
             |     0.5689
             |
           3 |  -1.343502  -1.838477
             |     0.2463     0.1815
    
    alpha = 0.05
    Reject Ho if p <= alpha/2
    

    Interpreting the Results

    The first row in each cell provides the Dunn's pairwise z test statistic for each comparison, and the second row provides your corrected p-values.

    Notice that, once corrected for multiple comparisons, none of your pairwise tests are significant at an alpha of 0.05, which is not surprising given that each of your example "sites" was generated by exactly the same distribution. I hope this has been helpful. Happy analyzing!

    P.S. In the future, you should use set.seed() if you're going to construct example dataframes using runif (or any other kind of pseudorandom number generation). Also, if you have other questions about statistical analysis, it's better to ask at: https://stats.stackexchange.com/