This seems a trivial problem but i am unable to get the issue resolved!
I have taken numeric columns of iris data set ..then normalized it as below
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 -0.8976739 1.01560199 -1.335752 -1.311052
2 -1.1392005 -0.13153881 -1.335752 -1.311052
3 -1.3807271 0.32731751 -1.392399 -1.311052
4 -1.5014904 0.09788935 -1.279104 -1.311052
5 -1.0184372 1.24503015 -1.335752 -1.311052
6 -0.5353840 1.93331463 -1.165809 -1.048667
# performed PCA now
pccomp <- prcomp(iris.norm )
a <- summary(pccomp)
df <- t(df)
## Standard deviation Proportion of Variance Cumulative Proportion
## PC1 1.7083611 0.72962 0.72962
## PC2 0.9560494 0.22851 0.95813
## PC3 0.3830886 0.03669 0.99482
## PC4 0.1439265 0.00518 1.00000
Now converting rownames into a column for df so that PCs which were rownames forms the first column for further manipulation
df<-rownames_to_column(, var="PrinComp") %>% head
## PrinComp Standard deviation Proportion of Variance Cumulative Proportion
## 1 PC1 1.7083611 0.72962 0.72962
## 2 PC2 0.9560494 0.22851 0.95813
## 3 PC3 0.3830886 0.03669 0.99482
## 4 PC4 0.1439265 0.00518 1.00000
# Now will be selecting only those PCs where the cumulative proportion is say less than 96%
# subsetting
pcs<-as.vector(as.character(df[which(df$`Cumulative Proportion`<0.96),][,1])) # cumulative prop less than 96%
## [1] "PC1" "PC2"
Now i am creating a PC data frame statically of vector scores from the first 2 principal components which we got from the above condition (cum prop<0.96)
x1 <- pccomp$x[,1]
x2 <- pccomp$x[,2]
pcdf <- cbind(x1,x2)
## x1 x2
## [1,] -2.257141 -0.4784238
## [2,] -2.074013 0.6718827
## [3,] -2.356335 0.3407664
## [4,] -2.291707 0.5953999
## [5,] -2.381863 -0.6446757
## [6,] -2.068701 -1.4842053
My issue is how can i create the above pc data frame dynamically once i know the no of PCs based on condition such as cumulative proportion say being less than 0.95??
You can just run a while loop on the df's cumulative proportion
field and append the transformed value till it's less than the required threshold.
threshold = 0.96
pcdf = list()
i = 1
while(df$`Cumulative Proportion`[i]<threshold){
pcdf[[i]] = pccomp$x[,i]
i = i +1
pcdf =
names(pcdf) = paste("x",c(1:ncol(pcdf)),sep="")
The output
> head(pcdf)
x1 x2
1 -2.257141 -0.4784238
2 -2.074013 0.6718827
3 -2.356335 0.3407664
4 -2.291707 0.5953999
5 -2.381863 -0.6446757
6 -2.068701 -1.4842053
when the threshold = 0.999
running the same code gives
> head(pcdf)
x1 x2 x3
1 -2.257141 -0.4784238 0.12727962
2 -2.074013 0.6718827 0.23382552
3 -2.356335 0.3407664 -0.04405390
4 -2.291707 0.5953999 -0.09098530
5 -2.381863 -0.6446757 -0.01568565
6 -2.068701 -1.4842053 -0.02687825
Assuming you know the number of principle component you want say i
.you can use
a <- sapply(X = c(1:i),FUN = function(X){pcdf[[X]] = pccomp$x[,X]})
instead of the whole while loop section
so for i = 2 you get
> head(a)
[,1] [,2]
[1,] -2.257141 -0.4784238
[2,] -2.074013 0.6718827
[3,] -2.356335 0.3407664
[4,] -2.291707 0.5953999
[5,] -2.381863 -0.6446757
[6,] -2.068701 -1.4842053
where a is your result.