Search code examples
rdplyrplyranovapairwise.wilcox.test

t.test across a dataframe based on two different group of factors in R


i have a dataframe of variables for 11 species of plants recorded in 2 locations. for each specie, I am attempting to compare the mean of variables between two different locations using a t.test(or wilcoxon test).

Here is the first few rows of my data

 SPECIES   LOCATION X.COLONIZATION SPORE_DENSITY   pH  NO3  NH4    P Organic_C      K   Cu    Mn   Zn   BD X.Sand
1   C. comosa    Gauteng             90           387 5.40 8.24 1.35 1.10      0.95  94.40 3.36 84.40 4.72 1.45   68.0
2   C. comosa    Gauteng             84           270 5.25 8.36 1.37 1.20      0.99  94.87 3.39 84.87 4.77 1.36   76.0
3   C. comosa    Gauteng             96           404 5.55 8.19 1.32 1.11      0.94  94.01 3.35 84.01 4.68 1.54   78.0
4   C. comosa Mpumalanga             79           382 5.84 4.05 3.46 3.04      1.55 130.40 0.28 25.43 2.00 1.66   73.6
5   C. comosa Mpumalanga             82           383 5.49 4.45 3.48 3.09      1.53 131.36 0.27 25.35 2.12 1.45   76.5
6   C. comosa Mpumalanga             86           371 6.19 4.43 3.44 3.04      1.58 129.95 0.29 25.45 2.14 1.87   74.9
7  C. distans    Gauteng             80           334 5.48 8.88 1.96 3.33      0.99 130.24 0.99 40.01 3.94 1.55   70.0
8  C. distans    Gauteng             75           409 5.29 8.54 1.99 3.28      0.99 130.28 0.95 40.25 3.89 1.48   79.0
9  C. distans    Gauteng             85           259 5.67 8.63 1.93 3.39      1.02 130.30 0.98 40.12 3.97 1.62   79.0
10 C. distans Mpumalanga             65           326 5.61 6.02 2.65 4.45      2.58 163.25 1.79 53.11 6.11 1.68   72.0
11 C. distans Mpumalanga             79           351 5.43 6.58 2.55 4.49      2.59 163.55 1.78 52.89 6.04 1.63   78.0
12 C. distans Mpumalanga             71           251 5.79 6.24 2.59 4.41      2.59 163.27 1.75 53.03 6.19 1.73   75.0
   X.Silt X.Clay
1      12      9
2      16     13
3      14     14
4       9     10
5      11     16
6      13     16
7       8     11
8      12     15
9      10     16
10      8     10
11     15     14
12     16     12

for instance, for each specie, i want to compare (test for significane difference) the mean value of spore density in Gauteng and Mpumalanga. Any help please?


Solution

  • We group by 'SPECIES' and then use summarise with across on the numeric columns, subset the column values were 'LOCATION' is 'Gauteng' or the other one, apply the t.test and extract the pvalue

    library(dplyr) #1.0.0
    df1 %>%
        group_by(SPECIES) %>%
        summarise(across(where(is.numeric), ~ 
             t.test(.[LOCATION == 'Gauteng'], .[LOCATION == 'Mpumalanga'])$p.value))
    # A tibble: 2 x 16
    #  SPECIES   X.COLONIZATION SPORE_DENSITY    pH      NO3        NH4        P  Organic_C        K       Cu        Mn       Zn     BD X.Sand X.Silt X.Clay
    #  <chr>              <dbl>         <dbl> <dbl>    <dbl>      <dbl>    <dbl>      <dbl>    <dbl>    <dbl>     <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
    #1 C. comosa          0.146         0.614 0.149 0.000269    7.27e-8  1.35e-5 0.00000970  2.15e-6  3.12e-7   1.35e-5  7.23e-6 0.219   0.779  0.140  0.474
    #2 C. dista…          0.177         0.667 0.438 0.000624    1.94e-4  2.04e-5 0.00000670  4.48e-6  1.22e-6   1.90e-8  2.07e-5 0.0653  0.791  0.363  0.359
    

    data

    df1 <- structure(list(SPECIES = c("C. comosa", "C. comosa", "C. comosa", 
    "C. comosa", "C. comosa", "C. comosa", "C. distans", "C. distans", 
    "C. distans", "C. distans", "C. distans", "C. distans"), LOCATION = c("Gauteng", 
    "Gauteng", "Gauteng", "Mpumalanga", "Mpumalanga", "Mpumalanga", 
    "Gauteng", "Gauteng", "Gauteng", "Mpumalanga", "Mpumalanga", 
    "Mpumalanga"), X.COLONIZATION = c(90L, 84L, 96L, 79L, 82L, 86L, 
    80L, 75L, 85L, 65L, 79L, 71L), SPORE_DENSITY = c(387L, 270L, 
    404L, 382L, 383L, 371L, 334L, 409L, 259L, 326L, 351L, 251L), 
        pH = c(5.4, 5.25, 5.55, 5.84, 5.49, 6.19, 5.48, 5.29, 5.67, 
        5.61, 5.43, 5.79), NO3 = c(8.24, 8.36, 8.19, 4.05, 4.45, 
        4.43, 8.88, 8.54, 8.63, 6.02, 6.58, 6.24), NH4 = c(1.35, 
        1.37, 1.32, 3.46, 3.48, 3.44, 1.96, 1.99, 1.93, 2.65, 2.55, 
        2.59), P = c(1.1, 1.2, 1.11, 3.04, 3.09, 3.04, 3.33, 3.28, 
        3.39, 4.45, 4.49, 4.41), Organic_C = c(0.95, 0.99, 0.94, 
        1.55, 1.53, 1.58, 0.99, 0.99, 1.02, 2.58, 2.59, 2.59), K = c(94.4, 
        94.87, 94.01, 130.4, 131.36, 129.95, 130.24, 130.28, 130.3, 
        163.25, 163.55, 163.27), Cu = c(3.36, 3.39, 3.35, 0.28, 0.27, 
        0.29, 0.99, 0.95, 0.98, 1.79, 1.78, 1.75), Mn = c(84.4, 84.87, 
        84.01, 25.43, 25.35, 25.45, 40.01, 40.25, 40.12, 53.11, 52.89, 
        53.03), Zn = c(4.72, 4.77, 4.68, 2, 2.12, 2.14, 3.94, 3.89, 
        3.97, 6.11, 6.04, 6.19), BD = c(1.45, 1.36, 1.54, 1.66, 1.45, 
        1.87, 1.55, 1.48, 1.62, 1.68, 1.63, 1.73), X.Sand = c(68, 
        76, 78, 73.6, 76.5, 74.9, 70, 79, 79, 72, 78, 75), X.Silt = c(12L, 
        16L, 14L, 9L, 11L, 13L, 8L, 12L, 10L, 8L, 15L, 16L), X.Clay = c(9L, 
        13L, 14L, 10L, 16L, 16L, 11L, 15L, 16L, 10L, 14L, 12L)), class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))