Search code examples
rggplot2pcaellipse

Ellipses for groups on PCA from DESeq2


I'd like to add in ellipses around my three groups (based on the variable "outcome") on the following plot. Note that vsd is a DESeq2 object with the factors outcome and batch:

pcaData <- plotPCA(vsd, intgroup=c("outcome", "batch"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=outcome, shape=batch)) +
  geom_point(size=3) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  geom_text(aes(label=rownames(coldata_WM_D56C)),hjust=.5, vjust=-.8, size=3) +
  geom_density2d(alpha=.5) +
  coord_fixed()

PCA

I tried adding an ellipse, thinking it would inherit aesthetics from the top but it tried to make an ellipse for each point.

stat_ellipse() +

Too few points to calculate an ellipse

geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?

Computation failed in stat_density2d(): missing value where TRUE/FALSE needed

Suggestions? Thanks in advance.

> dput(pcaData)
structure(list(PC1 = c(-15.646673151638, -4.21111051849254, 13.1215703467274, 
-6.5477433859415, -3.22129766721873, 4.59321517871152, 1.84089686598042, 
37.8415172383233, 40.9996810499267, 37.6089348653721, -24.5520575763498, 
-46.5840253031228, -4.01498554781508, -31.227922394463), PC2 = c(31.2712754127142, 
5.89621557021357, -10.2425538634254, -3.44497747426626, 2.21504480008043, 
0.315695833259479, -4.66467589267529, -4.27504355920903, -1.08666029542243, 
-2.69753368235982, 5.89767436709778, -24.2836532766506, 4.43980653642228, 
0.659385524221137), group = structure(c(4L, 5L, 6L, 7L, 8L, 5L, 
8L, 1L, 2L, 3L, 6L, 9L, 9L, 9L), .Label = c("ctrl : 1", "ctrl : 2", 
"ctrl : 3", "non : 1", "non : 2", "non : 3", "preg : 1", "preg : 2", 
"preg : 3"), class = "factor"), outcome = structure(c(2L, 2L, 
2L, 1L, 1L, 2L, 1L, 3L, 3L, 3L, 2L, 1L, 1L, 1L), .Label = c("preg", 
"non", "ctrl"), class = "factor"), batch = structure(c(1L, 2L, 
3L, 1L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", 
"2", "3"), class = "factor"), name = structure(1:14, .Label = c("D5-R-N-1", 
"D5-R-N-2", "D5-R-N-3", "D5-R-P-1", "D5-R-P-2", "D5-Z-N-1", "D5-Z-P-1", 
"D6-C-T-1", "D6-C-T-2", "D6-C-T-3", "D6-Z-N-1", "D6-Z-P-1", "D6-Z-P-2", 
"D6-Z-P-3"), class = "factor")), .Names = c("PC1", "PC2", "group", 
"outcome", "batch", "name"), row.names = c("D5-R-N-1", "D5-R-N-2", 
"D5-R-N-3", "D5-R-P-1", "D5-R-P-2", "D5-Z-N-1", "D5-Z-P-1", "D6-C-T-1", 
"D6-C-T-2", "D6-C-T-3", "D6-Z-N-1", "D6-Z-P-1", "D6-Z-P-2", "D6-Z-P-3"
), class = "data.frame", percentVar = c(0.47709343625754, 0.0990361123451665
))

As Maurits Evers suggests, I've added a group aes, which only drew ellipses for 2 of 3 outcome types. enter image description here


Solution

  • Since you don't provide any sample data, here is an example using the faithful data.

    The key is to add a group aesthetic.

    require(ggplot2);
    
    # Generate sample data
    df <- faithful[1:10, ];
    df$batch <- as.factor(rep(1:5, each = 2));
    
    # This will throw a similar error/warning to yours
    #ggplot(df, aes(waiting, eruptions, color = eruptions > 3, shape = batch)) + geom_point() + stat_ellipse();
    
    # Add a group aesthetic and it works
    ggplot(df, aes(waiting, eruptions, color = eruptions > 3, shape = batch, group = eruptions > 3)) + geom_point() + stat_ellipse();
    

    enter image description here

    So in your case, try adding aes(..., group = outcome).