Search code examples
rpca

PCA while preserving row order in R


I want to do a PCA analysis using prcomp with a dataset that has duplicate factors in the first two columns followed by numerical vectors:

Genus1  Species1    6.320000    8.720000    6.420000
Genus2  Species2    8.430000    11.780000   4.490000
Genus2  Species2    8.310000    10.940000   4.180000
Genus3  Species3    9.290000    13.060000   5.990000
Genus3  Species3    8.960000    13.320000   6.36000

How can I turn this dataset into the correct format to run with prcomp such that the PC scores will in the same order as the original dataset?


Solution

  • Let's say your data is:

    x = structure(list(V1 = structure(c(1L, 2L, 2L, 3L, 3L), .Label = c("Genus1", 
    "Genus2", "Genus3"), class = "factor"), V2 = structure(c(1L, 
    2L, 2L, 3L, 3L), .Label = c("Species1", "Species2", "Species3"
    ), class = "factor"), V3 = c(6.32, 8.43, 8.31, 9.29, 8.96), V4 = c(8.72, 
    11.78, 10.94, 13.06, 13.32), V5 = c(6.42, 4.49, 4.18, 5.99, 6.36
    )), class = "data.frame", row.names = c(NA, -5L))
    

    You cannot do pca with factors anyway, so do:

    pca = prcomp(x[,3:5])
    pca_scores = cbind(x[,1:2],pca$x)
    pca_scores
          V1       V2        PC1        PC2          PC3
    1 Genus1 Species1 -3.4571239  0.8812539  0.003197962
    2 Genus2 Species2  0.2914003 -0.9790128 -0.165842662
    3 Genus2 Species2 -0.4813849 -1.3641274  0.099844800
    4 Genus3 Species3  1.8024971  0.5080058  0.199344981
    5 Genus3 Species3  1.8446114  0.9538805 -0.136545080