Search code examples
ggplot2anovatukey

A ggplot solution for plotting a Tukey test result from the agricolae package in correct (numerical) order on the x axis?


I am plotting some plant and soil data along a transect covering the sampling years 2000-2020, testing simple one-way ANOVA with the excellent agricolae package, and doing simple explorative base plot graphs.

However, I find it difficult to manipulate with the output from the agricolae package and the base plot function, and I would now like to make a nice (preferably ggplot) graph with data in proper order in presentation quality.

My data (named "biochemA") looks like this:

Year Transect Depth NH4_(ug/L)  
2000 1 1 391,777  
2009 1 1 471,616 
2011 1 1 362,964
2013 1 1 348,208 
2015 1 1 234,341 
2017 1 1 768,844
2019 1 1 599,063 
2020 1 1 912,435
2000 2 1 452,272 
2009 2 1 391,134
2011 2 1 285,286 
2013 2 1 755,01 
2015 2 1 376,022 
2017 2 1 1205,095
2019 2 1 2940,163 
2020 2 1 298,487 
2000 3 1 1409,322 
2009 3 1 847,658 
2011 3 1 332,635 
2013 3 1 487,695 
2015 3 1 337,721 
2017 3 1 1702,21 
2019 3 1 2684,409 
2020 3 1 448,644
  • and I have done a really simple one-way ANOVA + Tukey test, using the agricolae package:
modelNH4<-aov(biochemA$NH4_(ug/L) ~ Year, data = biochemA)

out_NH4 <- HSD.test(modelNH4,"Year", group=TRUE,console=TRUE)

plot(out_NH4,main="NH4", xlab="Year", ylab="μg/g soil", las=2)

This gives me a graph like this:

enter image description here

which is fine for exploratory purposes. However, I would like to plot it with the years on the x-axis in chronological order (2020-2000), whereas the base plot function automatically displays the graphs by decreasing means. Also I would like to show standard error and not standard deviations.

The output from the HSD tst looks like this:

> out_bioANH4
$statistics
   MSerror Df     Mean       CV      MSD
  173.8115 38 11.83804 111.3678 26.72761

$parameters
   test name.t ntr StudentizedRange alpha
  Tukey   Year   8          4.53321  0.05

$means
     biochemA1$NH4_.ug.g_soil._       std r       Min       Max
2000                   7.662795  6.182201 5  3.000779 18.067119
2009                   6.314628  3.589682 5  2.905835 12.042757
2011                   4.866242  1.180330 5  3.698008  6.192181
2013                   6.401148  3.330437 5  3.628023 10.752502
2015                   4.286512  1.424005 5  2.777662  6.101243
2017                  14.385232  8.981692 5  6.405698 29.624701
2019                  42.317093 18.145402 5 12.244628 57.310143
2020                   8.470637  4.223832 5  4.573570 15.409000
           Q25       Q50       Q75
2000  3.547272  5.327341  8.371463
2009  3.728181  6.008561  6.887809
2011  4.097738  4.257121  6.086162
2013  3.684275  4.766714  9.174225
2015  3.089635  4.135425  5.328595
2017  9.569202 12.620927 13.705629
2019 38.758582 50.761796 52.510317
2020  5.761486  7.738720  8.870408

$comparison
NULL

$groups
     biochemA1$NH4_.ug.g_soil._ groups
2019                  42.317093      a
2017                  14.385232      b
2020                   8.470637      b
2000                   7.662795      b
2013                   6.401148      b
2009                   6.314628      b
2011                   4.866242      b
2015                   4.286512      b

attr(,"class")
[1] "group"

Solution

  • What you'll probably have to do is take the elements from out_bioANH4 and use them in your custom plot, such as ggplot2.

    If you type out_bioANH4 at the prompt, you'll see that out_bioANH4$means contains a matrix of the descriptive statistics of your data. If you use ggplot2, you can get similar plots as the agricolae package using geom_errorbar.

    See what this gives you:

    library(ggplot2)
    
    #this calculates SE from the summary statistics and adds it to the summary matrix
    out_bioANH4$means$se <- out_bioANH4$means$std/sqrt(out_bioANH4$means$r)
    
    ggplot(data=out_bioANH4$means) +
    #this calculates ymin and ymax, which are the location of mean +/- SE
      geom_errorbar(aes(ymin=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._ - out_bioANH4$means$se, 
                        ymax=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._ + out_bioANH4$means$se, 
                        x=row.names(out_bioANH4$means), color=out_bioANH4$groups$groups), width=0) +
    #this puts points at the means of the graph and does a different color for each
      geom_point(aes(y=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._,
                       x=row.names(out_bioANH4$means), 
      color=out_bioANH4$groups$groups[order(row.names(out_NH4$groups))])) +
      xlab("Year") +
      ylab("NH4_(ug/L)") +
      scale_colour_discrete(name="Group")
    

    I couldn't reproduce your results with HSD using your raw data in your "biochemA" example. If your out_bioANH4 example is correct, this ggplot2 stuff should get you started.

    Because the x in the geom_errorbar is in numerical order, the x axis should be ordered in increasing order. If you want to reverse it, you can do that easily with ggplot.