I have a data frame df1 like this one (in reality it has thousands of rows):
SampleID PC1 PC2
SJ-27_SJ-27 0.0246128 0.0188152
SJ-28_SJ-28 0.0286733 -0.0145702
SJ-54_SJ-54 0.0344723 0.0236423
SJ-61_SJ-61 0.0265009 0.0202153
SJ-66_SJ-66 0.0303340 0.0071670
SJ-71_SJ-71 -0.0004866 -0.0037853
Using R, I want to plot PC1 vs PC2, like:
plot(df1[,2], df1[,3])
But I want to give different colors to the points of the plot according to the row number. For instance, rows 1-2 in green, rows 3-4 in red, rows 5-6 in grey. I would get an image like that in the following link:
https://www.biostars.org/p/271694/
There must be a very simple way of doing this, but I am not able to find it. Thank you very much.
In base R, you can add a column (or create a vector) with factor variables for the population groups, then map to color. Using your example data:
df1$Grp <- factor(c("A", "A", "A", "B", "B", "B"))
plot(df1$PC1, df1$PC2, col = df1$Grp, pch = 16)
ggplot2
gives you more control over the color mapping and an automatic legend:
library(ggplot2)
ggplot(df1, aes(PC1, PC2)) +
geom_point(aes(color = Grp)) +
theme_bw()