Search code examples
rplotgroupingpca

How to remove specific group from a plot but plot stays the same in R?


data('iris')
pca.irix <- PCA(iris[ ,1:4])
gg <- factoextra::fviz_pca_biplot(X = pca.irix, 
                             # samples
                             fill.ind = iris$Species, col.ind = 'black',
                             pointshape = 21, pointsize = 1.5,
                             geom.ind = 'point', repel = T,
                             geom.var = FALSE )

I would like to obtain a plot that is exactly like the plot above but without the specie setosa. I started doing this, but do not know how to continue

setosa_wo <- iris %>% 
             filter(Species != 'setosa')

gg + scale_x_continuous(limits = c((-2), 2)) + scale_y_continuous(limits = c((-2), 2))

How to remove a colored group from a plot? But the plot should stay the same.


Solution

  • One approach to remove one or any number of groups from the plot would be to filter the data used for the layers, e.g. having a look at gg$layers show that your PCA plot is composed of six layers, however only in the first two of the layers are the groups used as fill color. Therefore I simply filtered the data for these two layers which gives me a plot where setosa is removed.

    EDIT Following the suggestion by @DaveArmstrong I added his code to fix the ranges of the axes on the original ranges and addtionally added the original colors

    library(FactoMineR)
    library(ggplot2)
    
    pca.irix <- PCA(iris[ ,1:4])
    
    gg <- factoextra::fviz_pca_biplot(X = pca.irix, 
                                      # samples
                                      fill.ind = iris$Species, col.ind = 'black',
                                      pointshape = 21, pointsize = 1.5,
                                      geom.ind = 'point', repel = T,
                                      geom.var = FALSE )
    
    # First: Get the ranges
    yrg <- ggplot2::layer_scales(gg)$y$range$range
    xrg <- ggplot2::layer_scales(gg)$x$range$range
    
    # Filter the data
    gg$layers[[1]]$data <- dplyr::filter(gg$layers[[1]]$data, Fill. != "setosa")
    gg$layers[[2]]$data <- dplyr::filter(gg$layers[[2]]$data, Fill. != "setosa")
    
    gg + 
      # Set the limits to the original ones
      ggplot2::coord_cartesian(xlim=xrg, ylim=yrg, expand=FALSE) +
      # Add orignial colors
      ggplot2::scale_fill_manual(values = scales::hue_pal()(3)[2:3])
    

    Created on 2020-10-16 by the reprex package (v0.3.0)