Search code examples
rstatisticsstatistical-testkruskal-wallis

How to run the Kruskal-Wallis or Mann-Whitney Test in R?


Can anyone give me a hint on how to run the Kruskal-Wallis Test below?

My objective : Is there any significance of the growth (agg_rel_abund) of bacteria between Forest and Urban for each family.

The code I have tried in R : kruskal.test(Habitat ~ agg_rel_abund, data = my_data) but obviously I know that is wrong... because I didn't hit my objective..

Let me briefly explain about my data :

There are types of sample, which is F and W.

When the sample name start with F, it means the Habitat is from Urban.

When the sample name start with W, it means the Habitat is from Forest.

It is okay if want to perform Mann-Whitey Test, or any Non-Parametric Test too... as long as can get to know the significance of the growth (agg_rel_abund) of bacteria between Forest and Urban for each family.

Sample Habitat Family agg_rel_abund
F10 Urban Acetobacteraceae 0
F2 Urban Acetobacteraceae 0
F3 Urban Acetobacteraceae 0
F7 Urban Acetobacteraceae 0.000132118
F8 Urban Acetobacteraceae 0
W10 Forest Acetobacteraceae 0
W13 Forest Acetobacteraceae 0
W3 Forest Acetobacteraceae 0
W6 Forest Acetobacteraceae 0
W9 Forest Acetobacteraceae 0
F10 Urban Bacillaceae 0.00488836
F2 Urban Bacillaceae 0.000924825
F3 Urban Bacillaceae 0.001056943
F7 Urban Bacillaceae 0.002378121
F8 Urban Bacillaceae 0.002906593
W10 Forest Bacillaceae 0.000264236
W13 Forest Bacillaceae 0.027876866
W3 Forest Bacillaceae 0.001585414
W6 Forest Bacillaceae 0.001056943
W9 Forest Bacillaceae 0.004492007
F10 Urban Carnobacteriaceae 0
F2 Urban Carnobacteriaceae 0
F3 Urban Carnobacteriaceae 0
F7 Urban Carnobacteriaceae 0
F8 Urban Carnobacteriaceae 0.000132118
W10 Forest Carnobacteriaceae 0
W13 Forest Carnobacteriaceae 0
W3 Forest Carnobacteriaceae 0.000132118
W6 Forest Carnobacteriaceae 0

Solution

  • This question should be in cross-validated.

    If you want to know whether the the growth is varying with Family, irrespective of the Habitat, you can perform kruskal.test with agg_rel_abund as dependent variable and Family as independent variable.

    kruskal.test(agg_rel_abund ~ Habitat, data = my_data)
    
    Kruskal-Wallis rank sum test
    
    data:  agg_rel_abund by Habitat
    Kruskal-Wallis chi-squared = 0.0051556, df = 1, p-value = 0.9428
    

    If you are sure that there is no difference in growth across different families, you can directly perform kruskal.test with agg_rel_abund as dependent variable and Habitat as independent variable.

    kruskal.test(agg_rel_abund ~ Habitat, data = my_data)
    
    Kruskal-Wallis rank sum test
    
    data:  agg_rel_abund by Habitat
    Kruskal-Wallis chi-squared = 0.0051556, df = 1, p-value = 0.9428
    

    For each habitat, you can perform kruskal.test to check the significant of difference in growth among families

    library(dplyr)
    
        for (i in unique(family$Habitat)) {
      x <- kruskal.test(agg_rel_abund ~ family,
                        data = family[family$Habitat==i,])
      out[[i]] <- c(Kruskal.Wallis.H = x[["statistic"]][["Kruskal-Wallis chi-squared"]],
                    Sig = x[["p.value"]],
                    df = x[["parameter"]][["df"]])
      }
    
    out <- bind_rows(out)
    out$Habitat <- unique(family$Habitat)