I am running a PCA on Hellinger-transformed species data for multiple sites in a single year (ex: 1995). I want to calculate site scores from those same sites in the next year (1996) by projecting data from 1996 onto the PCA ordination constructed from the first year's data (1995). This is in order to get a measure of distance moved in ordination space from one year to another. Is there an R package that allows me to do this?
As @Richard Telford mentions, you can use the predict
method for class "rda"
if you use rda()
from the vegan package to do this PCA.
Here's an example using the Dutch dune data set, assuming that the first half come from year 1 and the second half of the samples from year 2 (they don't).
It's a little involved to do all the steps of generating the final plot, but the predictions are easy to do; the main thing is that you need to have the same columns in both year's data sets (the same species).
library("vegan")
data(dune)
take <- seq_len(floor(nrow(dune) / 2))
pca <- rda(dune[take, ], scale = TRUE)
Set a default for scaling of scores and extract the observed site scores
scl <- "sites"
scrs <- scores(pca, display = "sites", choices = 1:2, scaling = scl)
Predict from pca
for the left out dune
samples. Note you have to specify type = "wa"
to get site scores.
pred <- predict(pca, newdata = dune[-take, ], type = "sites",
scaling = scl)
When we create the plot we need axis limits than span scores of both plots, in case:
lims <- apply(rbind(scrs, pred[, 1:2]), 2L, range)
(Note pred
contains predictions for all axes, not just the first two; you can adjust this using the rank
argument.)
Now we can plot both sets of scores, year 1 as points and year 2 as arrows drawn from the year 1 to the year 2 prediction:
plot(pca, display = "sites", type = "n", scaling = scl,
xlim = lims[,1], ylim = lims[,2])
arrows(scrs[,1], scrs[,2], pred[,1], pred[,2], col = "blue",
length = 0.1)
points(pca, display = "sites", scaling = scl, pch = 21, bg = "white")
This produces: