Search code examples
rvegancorrespondence-analysis

How do you rescale variables and measure distance from CCA coordinate to centroid in vegan?


I'm a new student in a bioinformatics lab, please feel free to correct me if anything is wrong.

I have made a CCA using the vegan package in R with the following script:

cca.analysis <- cca(mod ~ genus1 + genus2 + genus3, data)

I'm currently attempting to measure the scores/ contribution of each variable (genus) so I can determine which one was most influential to community variation in my dataset. I have two issues:

  1. How do you rescale the contribution of each genus irrespective of it's relative frequency to the other genera? For example, genus 1 is highly abundant compared to genus 3, which would mean that it will contribute more variation to the analysis.
  2. What script or function in the package would you use to measure the distance from the centroid to find the genus' contribution to variation?

Edit: I have made a reproducible example, to help give some insight about the question. Here is the genus data:

║ genus_1 ║ genus_2 ║ genus_3 ║ ║ 15.635 ║ 10.293 ║ 0 ║ ║ 9.7813 ║ 9.0061 ║ 5.4298 ║ ║ 15.896 ║ 2.5612 ║ 3.4335 ║ ║ 4.0054 ║ 0 ║ 2.0043 ║ ║ 15.929 ║ 16.213 ║ 0 ║ ║ 11.072 ║ 15.434 ║ 0 ║ ║ 12.539 ║ 7.2498 ║ 0 ║ ║ 9.1164 ║ 11.526 ║ 2.1649 ║ ║ 4.5011 ║ 0 ║ 0 ║ ║ 11.66 ║ 13.46 ║ 5.1416 ║

The mod part in the formula I provided corresponds to the following data, which I extracted from a PCoA analysis:

║ Coord_1 ║ Coord_2 ║ Coord_3 ║ Coord_4 ║ Coord_5 ║ Coord_6 ║ Coord_7 ║ ║ 0.954 ║ 0.928 ║ 0.952 ║ 1.009 ║ 1.016 ║ 0.943 ║ 1.031 ║ ║ 0.942 ║ 1.088 ║ 1.100 ║ 1.015 ║ 1.080 ║ 1.140 ║ 1.002 ║ ║ 0.932 ║ 0.989 ║ 1.005 ║ 0.974 ║ 0.990 ║ 1.047 ║ 1.035 ║ ║ 0.929 ║ 1.111 ║ 1.094 ║ 0.847 ║ 0.932 ║ 0.940 ║ 1.016 ║ ║ 0.947 ║ 1.008 ║ 0.937 ║ 1.055 ║ 1.056 ║ 0.964 ║ 1.022 ║ ║ 0.948 ║ 1.054 ║ 0.987 ║ 1.018 ║ 1.017 ║ 0.965 ║ 0.994 ║ ║ 0.946 ║ 1.023 ║ 0.911 ║ 1.014 ║ 1.062 ║ 1.076 ║ 1.063 ║ ║ 1.041 ║ 1.000 ║ 0.945 ║ 0.872 ║ 1.036 ║ 0.907 ║ 1.029 ║ ║ 0.926 ║ 1.107 ║ 1.027 ║ 0.943 ║ 0.993 ║ 1.006 ║ 0.947 ║ ║ 1.038 ║ 1.016 ║ 1.008 ║ 1.013 ║ 0.997 ║ 0.891 ║ 0.988 ║

You can plot this in R with function plot and this is hopefully get something like this: CCA plot


Solution

  • Actually, the scaling of the constraining variables (genus1 etc) does not influence their contributions to the model. You can verify this by multiplying one of your constraints with some number (say 10) and comparing the resulting models and seeing that they do not change. What will change are the regression coefficients for constraints, but they are of no interest here (regression coefficient will change to cancel the effect of multiplication).

    The key point is: what do you mean with "contribution"? If you mean how much each of these constraints "explains" of the total variation in the data, you can get this information from anova(cca.analysis, by = "terms") or alternatively from anova(cca.analysis, by = "margin"). The first analysis will be sequential decomposition of explained variation where the components add up to 100% of explained, and the latter decomposition to unique terms where the components do not add up to 100%. Up to three components (genus), you can also use varpart function (for cca with argument chisquare = TRUE: for this you need the latest vegan release) which decomposes the total explained variation into unique and joint contributions.

    If you mean something else with "contribution", please explain.