Search code examples
rpca

Extract PCn of a PCA Analysis


I have a Dataframe as follows.....

df <-data.frame(variableA, variableB, variableC, variableD, variableE)

prcomp(scale(df))
summary(prcomp)

gives the following results

                          PC1    PC2    PC3     PC4     PC5
Cumulative Proportion  0.5127 0.7222 0.8938 0.96075 1.00000

Is there a way I add the PCs to a new dataframe, up until the Cumulative Proportion reaches 85%?

I have a number of Dataframes that I wish to run this for, they vary in size but I would like 85% to be an arbitrary cut off point.


Solution

  • Here is an approach to identify the components explaining up to 85% variance, using the spam data from the kernlab package.

    library(kernlab)
    data(spam)
    # log transform independent variables, ensuring all values above 0
    princomp <- prcomp(log10(spam[,-58]+1))
    stats <- summary(princomp)
    # extract variable importance and list items explaining up to 85% variance
    importance <- stats$importance[3,]
    importance[importance <= 0.85]
    

    ...and the output:

    > importance[importance <= 0.85]
        PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9    PC10    PC11 
    0.49761 0.58021 0.63101 0.67502 0.70835 0.73188 0.75100 0.76643 0.78044 0.79368 0.80648 
       PC12    PC13    PC14 
    0.81886 0.83046 0.84129 
    >
    

    We can obtain the factor scores for the first 14 components and save them as a data frame as follows.

    resultNames <- names(importance[importance <= 0.85])
    # return factor scores 
    x_result <- as.data.frame(princomp$x[,resultNames])
    head(x_result)
    

    ...and the output:

    > head(x_result)
             PC1         PC2          PC3          PC4          PC5         PC6         PC7
    1  0.7364988  0.19181730  0.041818854 -0.009236399  0.001232911  0.03723833 -0.01144332
    2  1.3478167  0.22953561 -0.149444409  0.091569400 -0.148434128 -0.01923707 -0.07119210
    3  2.0489632 -0.02668038  0.222492079 -0.107120738 -0.092968198 -0.06400683 -0.07078830
    4  0.4912016  0.20921288 -0.002072148  0.015524007 -0.002347262 -0.14519336 -0.09238828
    5  0.4911676  0.20916725 -0.002122664  0.015467369 -0.002373622 -0.14517812 -0.09243136
    6 -0.2337956 -0.10508875  0.187831101 -0.335491660  0.099445713  0.09516875  0.11234080
              PC8          PC9        PC10        PC11        PC12         PC13        PC14
    1 -0.08745771  0.079650230 -0.14450436  0.15945517 -0.06490913 -0.042909658  0.05739735
    2  0.00233124 -0.091471125 -0.10304536  0.06973190  0.09373344  0.003069536  0.02892939
    3 -0.10888375  0.227437609 -0.07419313  0.08217271 -0.12488575  0.150950134  0.05180459
    4 -0.15862241  0.003044418  0.01609690  0.01720151  0.02313224  0.142176889 -0.04013102
    5 -0.15848785  0.002944493  0.01606874  0.01725410  0.02304496  0.142527110 -0.04007788
    6 -0.13790588  0.197294502  0.07851300 -0.08131269 -0.02091459  0.246810914 -0.01869192
    > 
    

    To merge the data with the original data frame, we can use cbind().

    mergedData <- cbind(spam,x_result)