I am trying to perform a PCA analysis using the psych package in R.
I got two variables that I want to combine into one component displaying standard of living:
slvpens:
Min. 1st Qu. Median Mean 3rd Qu. Max. Standard Deviation
0.000 3.000 5.000 4.587 6.000 10.000 2.28857
slvuemp:
Min. 1st Qu. Median Mean 3rd Qu. Max. Standard Deviation
0.000 3.000 4.000 4.095 5.000 10.000 2.099822
Using the phych-package, I perfom the analysis:
(slv_pca <- ESS %>% prcomp(
formula = ~ slvpens + slvuemp, # Selecting variables
data = ., na.action = na.exclude)) # Exclude NAs
With the following results:
Standard deviations (1, .., p=2):
[1] 2.651352 1.611470
Rotation (n x k) = (2 x 2):
PC1 PC2
slvpens -0.7699869 0.6380597
slvuemp -0.6380597 -0.7699869
Everything is good. However, if I z-standardize the variables:
(slv_pca <- ESS %>% prcomp(
formula = ~ slvpens + slvuemp, # Selecting variables
data = ., na.action = na.exclude, # Exclude NAs
center = TRUE, scale = TRUE)) # Z-standardize
The picture changes and both PC1 and PC2 is equal. Also, my two components contribute exactly the same?
Standard deviations (1, .., p=2):
[1] 1.2058739 0.7388289
Rotation (n x k) = (2 x 2):
PC1 PC2
slvpens -0.7071068 0.7071068
slvuemp -0.7071068 -0.7071068
What is going on here?
The purpose of scaling / centering before PCA is to ensure you give your variables equal weight, and center your PC scores, see more here. Right now you have two variables that are already on the same scale.
You don't need to scale, see my example below:
# here i convert the iris columns into 1:10 ranks
scale_iris =apply(iris[,1:4],2,function(i)as.numeric(cut(i,10,labels=1:10)))
par(mfrow=c(1,2))
plot(prcomp(iris[,1:4],scale=TRUE,center=TRUE)$x[,1:2],
col=factor(iris$Species),main="Actual iris PCA")
plot(prcomp(scale_iris,center=TRUE)$x[,1:2],
col=factor(iris$Species),main="Scale iris PCA")
If there is information in the ordinal variables, and they are on the same scale, it will be captured by the PCA.
And also of note, by default prcomp()
centers the data (as it should) and does not scale unless specified.