Search code examples
rplotcolorspca

R. Assign colors to row ranges in data frame producing a PCA dot plot


I have a data frame df1 like this one (in reality it has thousands of rows):

SampleID  PC1  PC2
SJ-27_SJ-27  0.0246128  0.0188152
SJ-28_SJ-28  0.0286733 -0.0145702
SJ-54_SJ-54  0.0344723  0.0236423
SJ-61_SJ-61  0.0265009  0.0202153
SJ-66_SJ-66  0.0303340  0.0071670
SJ-71_SJ-71 -0.0004866 -0.0037853

Using R, I want to plot PC1 vs PC2, like:

plot(df1[,2], df1[,3])

But I want to give different colors to the points of the plot according to the row number. For instance, rows 1-2 in green, rows 3-4 in red, rows 5-6 in grey. I would get an image like that in the following link:

https://www.biostars.org/p/271694/

There must be a very simple way of doing this, but I am not able to find it. Thank you very much.


Solution

  • In base R, you can add a column (or create a vector) with factor variables for the population groups, then map to color. Using your example data:

    df1$Grp <- factor(c("A", "A", "A", "B", "B", "B"))
    
    plot(df1$PC1, df1$PC2, col = df1$Grp, pch = 16)
    

    enter image description here

    ggplot2 gives you more control over the color mapping and an automatic legend:

    library(ggplot2)
    ggplot(df1, aes(PC1, PC2)) +
    geom_point(aes(color = Grp)) +
    theme_bw()
    

    enter image description here