Search code examples

While using R, PCA and Plotting Cumulative Variance

I am working with R using a scaled dataset and principle component analysis (princomp). Everything works fine but I would like to graph the cumulative % variances of principle components to the whole. The summary provides this info but I am not able to access it yet. In other words, I want to y='Cumulative Proportion' from pca vs. 'component#'.

pca <- princomp(class5_subset_scaled)
summary(pca) # summary provides 

Importance of components:
                          Comp.1     Comp.2 ...
Standard deviation     0.0513980 0.04482971 ...
Proportion of Variance 0.2089728 0.15897513 ...
Cumulative Proportion  0.2089728 0.36794789 ...

However when I look at the names I am puzzled...

[1] "sdev" "loadings" "center" "scale" "n.obs" "scores" "call" 

Can I plot y='Cumulative Proportion' from pca vs. x='component#'?


  • You do not provide any data so I will illustrate with the internal iris data set. The summary shows what you want to get.

    iPCA = princomp(iris[,1:4])
    Importance of components:
                              Comp.1     Comp.2     Comp.3      Comp.4
    Standard deviation     2.0494032 0.49097143 0.27872586 0.153870700
    Proportion of Variance 0.9246187 0.05306648 0.01710261 0.005212184
    Cumulative Proportion  0.9246187 0.97768521 0.99478782 1.000000000

    As you noticed, the return from princomp has a component called sdev that is the "Standard deviation"

       Comp.1    Comp.2    Comp.3    Comp.4 
    2.0494032 0.4909714 0.2787259 0.1538707

    The variance is the square of the standard deviation.

        Comp.1     Comp.2     Comp.3     Comp.4 
    4.20005343 0.24105294 0.07768810 0.02367619

    The proportion of variance is the variance divided by the sum of all variances.

    iPCA$sdev^2 / sum(iPCA$sdev^2)
         Comp.1      Comp.2      Comp.3      Comp.4 
    0.924618723 0.053066483 0.017102610 0.005212184 

    And the Cumulative Proportion is the cumulative sum of the proportion of variance

    cumsum(iPCA$sdev^2 / sum(iPCA$sdev^2))
       Comp.1    Comp.2    Comp.3    Comp.4 
    0.9246187 0.9776852 0.9947878 1.0000000

    Now you have the Cumulative Proportion values, just plot them.

    plot(cumsum(iPCA$sdev^2 / sum(iPCA$sdev^2)), type="b")

    Cumulative proportion.

    Also, notice the scale on the plot. Depending on what you plan to do with the plot, you might really have wanted:

    plot(cumsum(iPCA$sdev^2 / sum(iPCA$sdev^2)), type="b", ylim=0:1)

    Cumulative plot to scale