Search code examples
rpcapsychstandardized

Z-standardization makes PC1 and PC2 exactly the same in this PCA analysis: Why?


I am trying to perform a PCA analysis using the psych package in R.

I got two variables that I want to combine into one component displaying standard of living:

  • slvpen: Standard of living of pensioners: 0 = Extremely bad, 10 = Extremely good.
  • slvuemp: Standard of living of unemployed: 0 = Extremely bad, 10 = Extremely good.

slvpens:

Min. 1st Qu. Median Mean 3rd Qu. Max. Standard Deviation 0.000 3.000 5.000 4.587 6.000 10.000 2.28857

slvuemp:

Min. 1st Qu. Median Mean 3rd Qu. Max. Standard Deviation 0.000 3.000 4.000 4.095 5.000 10.000 2.099822

Using the phych-package, I perfom the analysis:

(slv_pca <- ESS %>% prcomp(
  formula = ~ slvpens + slvuemp, # Selecting variables
  data = ., na.action = na.exclude)) # Exclude NAs

With the following results:

Standard deviations (1, .., p=2):
[1] 2.651352 1.611470

Rotation (n x k) = (2 x 2):
               PC1        PC2
slvpens -0.7699869  0.6380597
slvuemp -0.6380597 -0.7699869

Everything is good. However, if I z-standardize the variables:

(slv_pca <- ESS %>% prcomp(
  formula = ~ slvpens + slvuemp, # Selecting variables
  data = ., na.action = na.exclude, # Exclude NAs
  center = TRUE, scale = TRUE)) # Z-standardize

The picture changes and both PC1 and PC2 is equal. Also, my two components contribute exactly the same?

Standard deviations (1, .., p=2):
[1] 1.2058739 0.7388289

Rotation (n x k) = (2 x 2):
               PC1        PC2
slvpens -0.7071068  0.7071068
slvuemp -0.7071068 -0.7071068

What is going on here?


Solution

  • The purpose of scaling / centering before PCA is to ensure you give your variables equal weight, and center your PC scores, see more here. Right now you have two variables that are already on the same scale.

    You don't need to scale, see my example below:

    # here i convert the iris columns into 1:10 ranks
    scale_iris  =apply(iris[,1:4],2,function(i)as.numeric(cut(i,10,labels=1:10)))
    
    par(mfrow=c(1,2))
    plot(prcomp(iris[,1:4],scale=TRUE,center=TRUE)$x[,1:2],
    col=factor(iris$Species),main="Actual iris PCA")
    plot(prcomp(scale_iris,center=TRUE)$x[,1:2],
    col=factor(iris$Species),main="Scale iris PCA")
    

    enter image description here

    If there is information in the ordinal variables, and they are on the same scale, it will be captured by the PCA.

    And also of note, by default prcomp() centers the data (as it should) and does not scale unless specified.