Search code examples
rvegan

CCA only showing the first 4 variables


Newish to R but can someone help me understand why my CCA only shows the first 4 environmental variables??

Is it a significance issue or a code problem?

I have the species ordination in the Animal_matrix and then I wanted all the environmental variables in a plot. I have also tried: Animal_matrix ~ T_height., data = VegData1

It worked when I had 20 sites (which are represented as rows? - correct me if I am wrong) then I condensed the data down to just 5 "sites".

Call:
cca(formula = Animal_matrix ~ T_Height + T_Stem + T_DBH + G_Alive + G_Dead + ST_Alive + ST_Dead + L_Alive + L_Dead + T_Alive +      T_Dead + SB_Alive + SB_Dead, data = VegData1) 

Partitioning of scaled Chi-square:
              Inertia Proportion
Total          0.5967          1
Constrained    0.5967          1
Unconstrained  0.0000          0

Eigenvalues, and their contribution to the scaled Chi-square 

Importance of components:
                        CCA1   CCA2    CCA3    CCA4
Eigenvalue            0.3468 0.1419 0.06997 0.03802
Proportion Explained  0.5813 0.2378 0.11726 0.06371
Cumulative Proportion 0.5813 0.8190 0.93629 1.00000

Accumulated constrained eigenvalues
Importance of components:
                        CCA1   CCA2    CCA3    CCA4
Eigenvalue            0.3468 0.1419 0.06997 0.03802
Proportion Explained  0.5813 0.2378 0.11726 0.06371
Cumulative Proportion 0.5813 0.8190 0.93629 1.00000

Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions


Species scores

                             CCA1      CCA2      CCA3      CCA4
Australian Hobby          -1.3828  0.675743  1.996326 -1.206407
Australian Owlet-nightjar  0.9656  0.554703  0.093529  0.187641
....

Site scores (weighted averages of species scores)

        CCA1    CCA2     CCA3    CCA4
row1 -1.3828  0.6757  1.99633 -1.2064
row2 -1.0709 -0.6939  0.26238  2.1019
row3 -0.8794  0.2160 -1.57925 -0.6554
row4  0.7206 -2.7502  0.29279 -1.0531
row5  0.9656  0.5547  0.09353  0.1876


Site constraints (linear combinations of constraining variables)

        CCA1    CCA2     CCA3    CCA4
row1 -1.3828  0.6757  1.99633 -1.2064
row2 -1.0709 -0.6939  0.26238  2.1019
row3 -0.8794  0.2160 -1.57925 -0.6554
row4  0.7206 -2.7502  0.29279 -1.0531
row5  0.9656  0.5547  0.09353  0.1876


Biplot scores for constraining variables

            CCA1    CCA2     CCA3    CCA4
T_Height -0.5567 -0.5619 -0.58734 -0.1714
T_Stem   -0.9023 -0.3507 -0.24253 -0.0640
T_DBH    -0.5926 -0.5503 -0.52225 -0.2708
G_Alive  -0.8090 -0.2172  0.06109  0.5428

Solution

  • It appears that you created a whole lot of dummy variables for things being Alive of Dead. I suspect these are largely collinear to the point of being redundant; you don't have n unique variables, just 4.

    As such data and rank deficient, we can't keep them in models and do the linear algebra on them, so they get removed or aliased.

    You shouldn't be creating dummy variables by hand for R's modelling function (in general). Instead have variables that are factor with the required levels, say G with levels Alive and Dead and then R will work out the dummy variable for you. That said, you might still not be able to include all these Alive/Dead variables in the model if they don't add anything new once you know one of them.