Search code examples
rplotpcavegan

PCA biplot of data subset


I'm trying to produce pca biplots for data subsets. Within the same principal components environment I'd like to plot only subsets based on Moisture levels.

# Packages
library(vegan)

# Sample data
data(dune, dune.env)
dd <- cbind(dune.env, dune)

# Runnig PCA
pca1 <- rda(dd[, -c(1:5)], scale=T)

# Plot
plot(pca1, scaling=3)

# Now, instead the plot above, I'd like to get 5 different plots (one per dd$Moisture level) where I see the principal components scores but only for the subsets based on:
levels(dd$Moisture)

Many thanks in advance for any inputs!!


Solution

  • There are somewhat easier ways to do these faceted plots with ordixyplot() in vegan or the ggvegan package (currently alpha so simple things sometimes need to be done by hand).

    Example

    # Packages
    library("vegan")
    library("ggvegan")
    
    # Sample data
    data(dune, dune.env)
    
    # PCA
    pca1 <- rda(dune, scale = TRUE)
    

    The ggvegan version

    As I said, you need to do a bit of munging by hand and the moment, but you get the legend for free with ggplot.

    ## use fortify method to extract scores in ggplot-friendly format
    scrs <- fortify(pca1, scaling = 3)
    ## take only site scores for this
    scrs <- with(scrs, scrs[Score == "sites", ])
    ## add on Moisture variable
    scrs <- cbind(scrs, Moisture = dune.env$Moisture)
    
    ## for all points on one plot, Moisture coded by colour
    plt <- ggplot(scrs, aes(x = Dim1, y = Dim2, colour = Moisture)) + 
             geom_point() + coord_fixed()
    plt
    
    ## to facet that plot into multiple panels
    plt + facet_wrap( ~ Moisture)
    

    The ordixyplot() version

    More of the work is canned within ordixyplot() than with the ggvegan version, but you have to work a little harder to add the key (legend) and I can never remember how to do this with Lattice.

    ## Single plot
    ordixyplot(pca1, data = dune.env, formula = PC1 ~ PC2, group = Moisture)
    
    ## Facet plot
    ordixyplot(pca1, data = dune.env, formula = PC1 ~ PC2 | Moisture)
    

    Base graphics

    For base graphics there is an easier version to colour the points on a single plot. One version is

    scrs <- scores(pca1, display = "sites")
    cols <- c("red","green","blue","orange","purple")
    plot(scrs[,1], scrs[,2], pch = 19, 
         col = cols[as.numeric(as.character(dune.env$Moisture))])
    legend("topright", legend = 1:5, title = "Moisture", pch = 19, 
           col = cols, bty = "n")
    

    You can find out more about doing the plots this way using base graphics in a blog post that I wrote a few years ago.