Search code examples
rggplot2plottidyversepca

How to set scree plot scale as same as principal components?


I used this command line to make a Scree plot in which the first dimension shows most of the variation.

res.pca <- prcomp(log2(src1+1), scale. = TRUE)
res.pca
plot1 <- fviz_eig(res.pca)
plot1

enter image description here

Here is SD of the samples (36 samples):

Standard deviations (1, .., p=36):
 [1] 5.95582467 0.28407652 0.26522238 0.20868660 0.20012316 0.16888365 0.15432002 0.14181776 0.13427364
[10] 0.13116676 0.11774602 0.11533978 0.11221367 0.10495140 0.10142414 0.09890213 0.09604759 0.09339936
[19] 0.09077357 0.08893056 0.08650105 0.08548026 0.08308853 0.08097912 0.07497496 0.07413417 0.07224579
[28] 0.07124431 0.06996434 0.06759544 0.06335228 0.06141117 0.06091347 0.05944077 0.05849182 0.05754510

and my PCA plot is:

enter image description here

I want a help to know that how I can plot the Scree plot in a way that dimensions of the Scree plot to be in the same percentage of the PCA plot (e.g. PC1 <- 15.55% and PC2 <- 13.82%)?


Solution

  • You can do something like this, in your case you need to bind your information about groups to the PC dataframe:

    library(ggfortify)
    library(ggplot2)
    library(patchwork)
    
    set.seed(111)
    data = mtcars
    # we make up a group here
    data$group = sample(letters[1:3],nrow(data),replace=TRUE)
    
    res.pca = prcomp(log2(data[,-ncol(data)]+1))
    autoplot(res.pca,data=data,col="group")
    

    enter image description here

    Then use the same pca to make the scree:

    #variance explained
    varExp = (100*res.pca$sdev^2)/sum(res.pca$sdev^2)
    varDF = data.frame(Dimensions=1:length(varExp),
    varExp=varExp)
    
    ggplot(varDF,aes(x=Dimensions,y=varExp)) + geom_point() + 
    geom_col(fill="steelblue") + geom_line() + 
    theme_bw() + scale_x_continuous(breaks=1:nrow(varDF)) + 
    ylim(c(0,100)) + ylab("Perc variance explained")
    

    enter image description here