Search code examples
rggplot2pca

ggplot geom point, modify text


I am trying to perform pca analysis using R stat module prcomp and ggplot2, sample data looks like below. for each car model there is data in three columns so far i was able to generate the plot using code given below.

df:

> head(car.df)
   honda_1_smp honda_2_smp honda_3_smp audi_1_smp audi_2_smp audi_3_smp merc_1_smp merc_2_smp
s1    0.000289    0.000000    0.076095   0.056965   0.030314   0.000000   0.000000   0.028548
s2    1.588724    1.678821    0.795915   0.552910   0.503845   0.248782   0.201806   2.324172
s3    0.067802    0.068452    0.082904   0.014259   0.038896   0.044144   0.003634   0.167235
s4    0.000000    0.000000    0.000000   0.000000   0.000000   0.008724   0.000000   0.000000
s5    0.822612    1.137569    0.008302   0.025600   0.000000   0.000000   0.000000   0.000000
s6    0.025091    0.096847    0.000000   0.031416   0.024999   0.000000   0.012987   0.000000

Code:

carpca = prcomp(t(car.df), center=T)
summary(carpca)
car12 = data.frame(PC1=carpca$x[,1], PC2= carpca$x[,2], type=rownames(carpca$x))
ggplot(car12, aes(x=PC1 , y=PC2 , col=type)) +
  geom_point() + geom_text(aes(label = type), hjust=0, vjust=0) +
  xlab("PC1 89%") + ylab("PC2 77%") + ggtitle("car")

plotenter image description here

Question

How to group all my replicate headers as one color and shape in the plot and the legend. Meaning : Each Honda will have a same color and shape similarly for audi and merc.


Solution

  • I would use regex (gsub) to get rid of the replicate id from the "type" attribute.

    car12 = data.frame(PC1=carpca$x[,1], PC2= carpca$x[,2], type=gsub("_.*$", "", rownames(carpca$x)))
    ggplot(car12, aes(x=PC1 , y=PC2 , col=type)) +
      geom_point() + geom_text(aes(label = type), hjust=0, vjust=0) +
      xlab("PC1 89%") + ylab("PC2 77%") + ggtitle("car")
    

    enter image description here