Search code examples
rpcapredict

Predict() new data into PCA space in R


After performing a principal component analysis of a first data set (a), I projected a second data set (b) into PCA space of the first data set.

From this, I want to extract the variable loadings for the projected analysis of (b). Variable loadings of the PCA of (a) are returned by prcomp(). How can I retrieve the variable loadings of (b), projected into PCA space of (a)?

# set seed and define variables
set.seed(1)
a = replicate(10, rnorm(10))
b = replicate (10, rnorm(10))

# pca of data A and project B into PCA space of A
pca.a = prcomp(a)
project.b = predict(pca.a, b)

# variable loadings
loads.a = pca.a$rotation

Solution

  • Here's an annotated version of your code to make it clear what is happening at each step. First, the original PCA is performed on matrix a:

    pca.a = prcomp(a)
    

    This calculates the loadings for each principal component (PC). At the next step, these loadings together with a new data set, b, are used to calculate PC scores:

    project.b = predict(pca.a, b)
    

    So, the loadings are the same, but the PC scores are different. If we look at project.b, we see that each column corresponds to a PC:

                PC1         PC2         PC3        PC4         PC5          PC6         PC7         PC8
     [1,] -0.2922447  0.10253581  0.55873366  1.3168437  1.93686163  0.998935945  2.14832483 -1.43922296
     [2,]  0.1855480 -0.97631967 -0.06419207  0.6375200 -1.63994127  0.110028191 -0.27612541 -0.37640710
     [3,] -1.5924242  0.31368878 -0.63199409 -0.2535251  0.59116005  0.214116915  1.20873962 -0.64494388
     [4,]  1.2117977  0.29213928  1.53928110 -0.7755299  0.16586295  0.030802395  0.63225374 -1.72053189
     [5,]  0.5637298  0.13836395 -1.41236348  0.2931681 -0.64187233  1.035226594  0.67933996 -1.05234872
     [6,]  0.2874210  1.18573157  0.04358772 -1.1941734 -0.04399808 -0.113752847 -0.33507195 -1.34592414
     [7,]  0.5629731 -1.02835365  0.36218131  1.4117908 -0.96923175 -1.213684882  0.02221423  1.14483112
     [8,]  1.2854406  0.09373952 -1.46038333  0.6885674  0.39455369  0.756654205  1.97699073 -1.17281174
     [9,]  0.8573656  0.07810452 -0.06576772 -0.5200661  0.22985518  0.007571489  2.29289637 -0.79979214
    [10,]  0.1650144 -0.50060018 -0.14882996  0.2065622  2.79581428  0.813803739  0.71632238  0.09845912
                  PC9      PC10
     [1,] -0.19795112 0.7914249
     [2,]  1.09531789 0.4595785
     [3,] -1.50564724 0.2509829
     [4,]  0.05073079 0.6066653
     [5,] -1.62126318 0.1959087
     [6,]  0.14899277 2.9140809
     [7,]  1.81473300 0.0617095
     [8,]  1.47422298 0.6670124
     [9,] -0.53998583 0.7051178
    [10,]  0.80919039 1.5207123
    

    Hopefully, that makes sense, but I'm yet to finish my first coffee of the day, so no guarantees.