Search code examples
rstatisticsrdabiplot

Vegan RDA and biplot, remove values contributing >10% of variance


I am using the vegan package to do RDA and want to plot the data using biplot. In my data I have hundreds of values. What I would like to do is limit the variance explained to a set limit so in the example below to 0.1. So instead of having 44 of arrows I might only have say 8

library (vegan)           # Load library
library(MASS)             # load library
data(varespec)            # Dummy data
vare.pca <- rda(varespec, scale = TRUE)              # RDA anaylsis
biplot(vare.pca, scaling = 3,display = "species")    # Plot data but includes all

## extracts the percentage##
x =(sort(round(100*scores(vare.pca, display = "sp", scaling = 0)[,1]^2, 3), decreasing = TRUE)) 
## Plot percentage    
plot(length(x):1,sort(x)) # plot rank on value of y

Any help would be appreciated :)


Solution

  • Depending on the size of the data-set it would be possible to use either ordistep or ordiR2step to reducing the amount of "unimportant" variables in your plot (see https://www.rdocumentation.org/packages/vegan/versions/2.4-2/topics/ordistep). However, these functions use step-wise selection, which need to be used cautiously. Step-wise selection can select your included parameters based on AIC values, R2 values or p-values. It does not not select values based on the importance of these for the purpose of your question. It also does not mean that these variables have any meaning towards organisms or biochemical interactions. Nevertheless, step-wise selection can be helpful giving an idea on which parameters might be of strong influence on the overall variation in the data-set. Simple example below.

    rda0 <- rda(varespec ~1, varespec)
    rda1 <- rda(varespec ~., varespec)
    
    rdaplotp <- ordistep(rda0, scope = formula(rda1))
    plot(rdaplotp, display = "species", type = "n")
    text(rdaplotp, display="bp")
    

    Thus, by using the ordistep function the number of species displayed in the plot has been greatly reduced (see Fig 1 below). If you want to remove more variables (which I do not suggest) an option could be to look at the output of the biplot and throw out the variables which have the least amount of correlation with the principle components (see below), but I would advise against it.

    sumrda <- summary(rdaplotp)
    sumrda$biplot
    

    What would be wise, is to first check which question you want to answer and see if any of the included variables could be left out on forehand. This would already reduce the amount. Minor edit: I am also a bit confused why you want to remove parameters strongly contributing to your captured variation.

    Fig 1