Search code examples
rggplot2factoextrafactominer

PCA Biplot variable label customization


I am trying to make PCA biplot by using follwoing code from the given data. In the data there are 4 type of genotypes, belongs to 4 type of species and 4 types of variables were evaluated (SPAD, PN, Y(II), DMC) under 2 type of conditions 1= control, 2=Stress. I successfully make the PCA biplot as shown in picture But I am havng trouble with variable labels shown alongwith the arrows. I want to convert the #1 and #2 into SUPERCRIPT CT and HS or I want to keep the original name for example "Y(II)" but in the grpah it always convert to "Y.II.". Also I want to change the color of the individual text as they are shown in Parameter legends and also keep the SPecies legend color to differentiate. enter image description here

pca <- PCA(data.frame(data[,-2], row.names = 1), ncp=7, graph = TRUE, scale.unit = TRUE)

SPAD <- "Chlorophyll Index" 
IRGA <- "Gas exchange"
CF <- "Chlorophyll fluorescence"
ag <- "Morphological traits"
traits <- factor(c(SPAD,IRGA,CF,ag,SPAD, IRGA,CF,ag))

fviz_pca_biplot(pca, 
                geom.ind = c("point","text"),
                pointshape = 21,
                pointsize = 2.5,
                fill.ind = data$Species,
                col.ind = "black",
                col.var = traits,
                legend.title = list(fill = "Species", color = "Parameters"),
                repel = TRUE )+ 
  ggpubr::fill_palette("cosmic")+ # Indiviual fill color
  ggpubr::color_palette(c("brown", "purple", "red","blue")) +  # Variable colors
  theme_gray() + 
  theme(legend.position = "right", 
        legend.text = element_text(face="italic"),
        plot.caption = element_text(hjust = 0),
        legend.key.size = unit(0.5, 'cm'),
        legend.background = element_rect(fill='transparent'),
        panel.background = element_rect(colour = "grey30")) +
  labs(title = "", x= "PC1 (62.56%)", y= "PC2 (29.10%)", 
       caption = "*1: Control , 2: Heat stress")
Genotype Species SPAD1 Pn1 Y(II)1 DMC1 SPAD2 Pn2 Y(II)2 DMC2
BEL sp1 0.6 14.38 0.25 0.21 1.64 16.5 0.29 -0.4
BGB003 sp2 -0.24 14.87 0.2 -1.24 -0.33 16.63 0.27 -1.24
BGB008 sp2 -0.54 11.92 0.14 -1.24 -0.37 12.6 0.15 -0.72
BGB083 sp3 -1.18 6.61 0.13 0.74 -0.04 5.41 0.16 0.63
BGB086 sp3 -0.89 9.05 0.19 -0.53 -0.33 11.2 0.17 -0.28
BGB088 sp4 -0.4 8.75 0.15 -0.39 0.28 12.36 0.22 -0.6
BGB089 sp4 -0.52 9.86 0.2 -0.05 0.47 11.06 0.19 -0.44

Solution

  • Your variables are stored in the row names of your pca object. You can edit them to get the plot you desire. In particular you can change e.g. DMC1 to DMC^1 and so on, to tell ggplot2 you want to use superscript:

    pca$var <- lapply(pca$var, \(d) {
        # Replace Y.II. with Y(II)
        rownames(d) <- gsub("\\.II\\.", "(II)", rownames(d))
        # Replace e.g. DMC1 with DMC^1
        rownames(d) <- gsub("(1|2)$", "^\\1", rownames(d))
        d
    })
    

    When drawing your plot, make sure to add parse = TRUE, so it knows that DMC^1 should be treated as superscript.

    fviz_pca_biplot(pca,
        geom.ind = c("point", "text"),
        pointshape = 21,
        pointsize = 2.5,
        fill.ind = data$Species,
        col.ind = "black",
        col.var = traits,
        legend.title = list(fill = "Species", color = "Parameters"),
        repel = TRUE,
        parse = TRUE # this is important
    ) # + all your theme code goes here
    

    Output:

    enter image description here