Search code examples
rvariablespca

How do I find the link between principal components and raw data's variables?


I'm working with prcomp() function in R. I was wondering if there is any easy way to see variables contribution for each principal components.

Or, if prcomp isn't designed for that, which pca analysis (or pca "like") I can use to answer this question:

Which variables in the raw data are the most discriminant?

How to see the contribution of raw data variables in all principal components

Or, for each principal components:

PC1 = Var55 + Var2000 ( or 78% Var55 + 22% Var2000)
PC2 = Var19 + Var32 + Var45
PC3 = ...

Solution

  • As @Axeman mentioned, you can look at the rotation

    if you have this PCA:

    pcaRes <- prcomp(df, scale. = TRUE)
    

    Then, look at the rotations

    loadings <- pcaRes$rotation
    

    This should show how the variables contribute to the PCA axes. e.g., negative values indicate negative relationship.

    If you want the relative contribution of each variable, you can sum the total loadings for each PC axis (use absolute value for negatives) then divide each value with the column sum

    #You can do this quick and dirty way
    t(t(abs(loadings))/rowSums(t(abs(loadings))))*100
    
    # or this sweet function
    sweep(x = abs(loadings), MARGIN = 2, 
      STATS = colSums(abs(loadings)), FUN = "/")*100