I have a Dataframe as follows.....
df <-data.frame(variableA, variableB, variableC, variableD, variableE)
prcomp(scale(df))
summary(prcomp)
gives the following results
PC1 PC2 PC3 PC4 PC5
Cumulative Proportion 0.5127 0.7222 0.8938 0.96075 1.00000
Is there a way I add the PCs to a new dataframe, up until the Cumulative Proportion reaches 85%?
I have a number of Dataframes that I wish to run this for, they vary in size but I would like 85% to be an arbitrary cut off point.
Here is an approach to identify the components explaining up to 85% variance, using the spam
data from the kernlab
package.
library(kernlab)
data(spam)
# log transform independent variables, ensuring all values above 0
princomp <- prcomp(log10(spam[,-58]+1))
stats <- summary(princomp)
# extract variable importance and list items explaining up to 85% variance
importance <- stats$importance[3,]
importance[importance <= 0.85]
...and the output:
> importance[importance <= 0.85]
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11
0.49761 0.58021 0.63101 0.67502 0.70835 0.73188 0.75100 0.76643 0.78044 0.79368 0.80648
PC12 PC13 PC14
0.81886 0.83046 0.84129
>
We can obtain the factor scores for the first 14 components and save them as a data frame as follows.
resultNames <- names(importance[importance <= 0.85])
# return factor scores
x_result <- as.data.frame(princomp$x[,resultNames])
head(x_result)
...and the output:
> head(x_result)
PC1 PC2 PC3 PC4 PC5 PC6 PC7
1 0.7364988 0.19181730 0.041818854 -0.009236399 0.001232911 0.03723833 -0.01144332
2 1.3478167 0.22953561 -0.149444409 0.091569400 -0.148434128 -0.01923707 -0.07119210
3 2.0489632 -0.02668038 0.222492079 -0.107120738 -0.092968198 -0.06400683 -0.07078830
4 0.4912016 0.20921288 -0.002072148 0.015524007 -0.002347262 -0.14519336 -0.09238828
5 0.4911676 0.20916725 -0.002122664 0.015467369 -0.002373622 -0.14517812 -0.09243136
6 -0.2337956 -0.10508875 0.187831101 -0.335491660 0.099445713 0.09516875 0.11234080
PC8 PC9 PC10 PC11 PC12 PC13 PC14
1 -0.08745771 0.079650230 -0.14450436 0.15945517 -0.06490913 -0.042909658 0.05739735
2 0.00233124 -0.091471125 -0.10304536 0.06973190 0.09373344 0.003069536 0.02892939
3 -0.10888375 0.227437609 -0.07419313 0.08217271 -0.12488575 0.150950134 0.05180459
4 -0.15862241 0.003044418 0.01609690 0.01720151 0.02313224 0.142176889 -0.04013102
5 -0.15848785 0.002944493 0.01606874 0.01725410 0.02304496 0.142527110 -0.04007788
6 -0.13790588 0.197294502 0.07851300 -0.08131269 -0.02091459 0.246810914 -0.01869192
>
To merge the data with the original data frame, we can use cbind()
.
mergedData <- cbind(spam,x_result)