Search code examples
rpcaggbiplot

How to add circle around specific points in PCA produced by ggbiplot


I know how to produce a PCA plot through ggbiplot and this package works well.

But now I want to modify some specific points, such as their color, size and especially adding circles around some points but not cover them by geom_encircle() function.

Here is my reproducible example code below:

#load required packages
library(ggplot2)
library(devtools)
library(ggbiplot)

#load dataset
data(iris)

#perform principal component analysis
pca = prcomp(iris[ , 1:4], scale=T)

#define classes, generate & view PCA biplot
class = iris$Species
ggbiplot(pca, obs.scale = 1, var.scale = 1, groups = class, circle = FALSE)+
  geom_point(size = 3,aes(color = class))+
  geom_point(data=iris[iris$Species=="setosa",],pch=21, fill=NA, size=2, colour="black", stroke=2)

However, error information appeared:

Error in `geom_point()`:
! Problem while computing aesthetics.
i Error occurred in the 5th layer.
Caused by error in `FUN()`:
! object 'xvar' not found
Run `rlang::last_trace()` to see where the error occurred.

I may know it is caused by data in geom_point() which is not consistent to pca.

But I don't know how should I set the data in geom_point()

So I hope somebody could give me some advice or solutions.

Thanks in advance.


Solution

  • You can do this in a hacky way by using ggplot_build() to retrieve the data frame that was constructed by ggbiplot.

    gg0 <- ggplot(data=data,aes(x=data[,1],y=data[,2]))+
      geom_point(size = 3,aes(color = class))
    ggb <- ggplot_build(gg0)
    

    ggb$data is a list with a data frame for each layer of the plot. By poking around a bit we can figure out that the geom_point layer is the last (fourth), i.e. ggb$data[[4]]. All we need from this is the x and y coordinates, which we can combine with the original data set (hoping that row order is preserved, there weren't any incomplete cases discarded, etc.)

    my_data <- cbind(iris, ggb$data[[4]][c("x", "y")])
    m2 <- subset(my_data, Species == "setosa")
    gg0 + 
       geom_encircle(data = m2, aes(x = x, y = y)) +
       geom_point(data=m2, aes(x=x,y=y),
                  pch=21, fill=NA, size=2, colour="black", stroke=2)
    

    enter image description here