Search code examples
rggfortify

R: Color points in PCA based on groups when using autoplot


I have produced a PCA plot, where I plot a number of cells based on their expression of various genes. In this plot, I want to color some of the points in a separate color. I tried to achieve this by creading "groups", where I sort the cells based on their expression or lack of expression of "gene1".

Here's what my data frame looks (gene1, gene2 and cell_1, cell_2 etc. are colnames and rownames):

          gene1      gene2      gene3      gene4      gene5
cell_1   0.0000   0.279204  25.995400  46.171700  94.234100
cell_2   0.0000  23.456000  77.339800 194.241000 301.234000
cell_3   2.0000  13.100000  45.309200   0.776565   0.000000
cell_4   0.0000  10.500000 107.508000   3.032500   0.000000
cell_5   3.0000   0.000000   0.266139   0.762981 123.371000

Here's the code I use to try to achieve this:

library(ggplot2)
library(ggfortify)

# Group cells based on expression of a certain gene (to use for color labels in the next step)
groups <- factor(ifelse(df$gene1 > 0, "Positive", "Others"))

#Calculate PCs and plot PCA
autoplot(prcomp(log(df[]+1)), colour="Positive")

When I run this code, I get the following error:

Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'Positive'

Solution

  • How about this?

    df$groups <- factor(ifelse(df$gene1 > 0, "Positive", "Others"))
    
    head(df)
          gene1     gene2    gene3     gene4      gene5   groups
    1 0.5638534  8.968558 94.40170  62.93106 290.442698 Positive
    2 0.0000000 15.248374 45.87507 204.21703 291.501669   Others
    3 1.9059518 19.488162 75.89302  97.69643 177.833347 Positive
    4 1.9449987  6.358773 54.97159  41.54307 164.835188 Positive
    5 0.0000000 16.568077 31.62370  23.72278  31.774541   Others
    6 1.7199368  3.788276 80.51450 102.82221   6.259461 Positive
    
    autoplot(prcomp(log(df[1:5]+1)), data=df, colour='groups')
    

    enter image description here