Search code examples
rpca

Reading data only once and apply same function for different variables


Dataset I am working on looks like-DATA there are 6 different countries and r_1..r_13 specifies the reasons. I want to apply PCA on this dataset to find out the significant reasons for each country The question I want to ask is how can I run PCA for each country without reading file for each country instead I want to read the entire file as shown above. Also please check the code I am using for doing PCA-

    pca<-prcomp(numeric,center=T,scale=T)
    summary(pca)
    eigen_val<-pca$sdev ^2
    sum(eigen_val)
    prop_var<-round(eigen_val/sum(eigen_val),4)
    round(sum(prop_var[1:13]),4)
    load<-pca$rotation

After computing rotation matrix I will check which PC's are most correlated with which observed variables and accordingly I will decide the significance of the variables.(on the basis of- more than no. of PC's correlated with variable more is the significance of the variable) Kindly suggest whether the approach is correct or not ! Thanks!!


Solution

  • Here's a simple starting point for a solution that you can tweak to get the results in your desired format. Let's assume you're working with the iris dataset in R, and you want to do pca for each Species, kind of like how you want to do pca by each country in your data.

    library(caret)
    data(iris)
    Iris <- split(iris, iris$Species)
    for(i in 1:length(Iris)){
      assign(paste0("pca", i), prcomp(Iris[[i]][which(names(iris)!="Species")], center=T, scale.=T))
    }