I'm still new to R, trying to learn how to use the library vegan, which I can easily plot in R with the normal plot function. The problem arises when I want to plot the data in ggplot. I know I have to extract the right data from the list I've created, but which and how? The dataset I've been practicing on can be downloaded here https://drive.google.com/file/d/0B1PQGov60aoudVR3dVZBX1VKaHc/view?usp=sharing The code I've been using to get the data transformed is this:
library(vegan)
library(dplyr)
library(ggplot2)
library(grid)
data <- read.csv(file = "People.csv", header = T, sep = ",", dec = ".", check.names = F, na.strings=c("NA", "-", "?"))
data2 <- data[,-1]
rownames(data2) <- data[,1]
data2 <- scale(data2, center = T, scale = apply(data2, 2, sd))
data2.pca <- rda(data2)
Which gives me a list I can plot using the basic "plot" and "biplot" function, but I am at a loss as to how to plot both PCA and biplot in ggplot. I would also like to color the data points by group, e.g. sex. Any help would be great.
There is a ggbiplot(...)
function in package ggbiplot
, but it only works with objects of class prcomp, princomp, PCA, or lda.
plot.rda(...)
just locates each case (person) in PC1 - PC2 space. biplot.rda(...)
adds vectors to the PC1 and PC2 loadings for each variable in the original dataset. It turns out that plot.rda(...)
and biplot.rda(...)
use the data produced by summarizing the rda object, not the rda object itself.
smry <- summary(data2.pca)
df1 <- data.frame(smry$sites[,1:2]) # PC1 and PC2
df2 <- data.frame(smry$species[,1:2]) # loadings for PC1 and PC2
rda.plot <- ggplot(df1, aes(x=PC1, y=PC2)) +
geom_text(aes(label=rownames(df1)),size=4) +
geom_hline(yintercept=0, linetype="dotted") +
geom_vline(xintercept=0, linetype="dotted") +
coord_fixed()
rda.plot
rda.biplot <- rda.plot +
geom_segment(data=df2, aes(x=0, xend=PC1, y=0, yend=PC2),
color="red", arrow=arrow(length=unit(0.01,"npc"))) +
geom_text(data=df2,
aes(x=PC1,y=PC2,label=rownames(df2),
hjust=0.5*(1-sign(PC1)),vjust=0.5*(1-sign(PC2))),
color="red", size=4)
rda.biplot
If you compare these results to plot(data2.pca)
and biplot(data2.pca)
I think you'll see they are the same. Believe it or not the hardest part, by far, is getting the text to align properly wrt the arrows.