This issue applies to my own data, but for the sake of reproducability, my issue/question is also present in the FactoExtra vignette, or here, so I'll use that for the sake of simplicity.
To start, a simple PCA was generated (scale = T) and the coordinate variables from the first 4 axes extracted:
head(var$coord) # coordinates of variables
> Dim.1 Dim.2 Dim.3 Dim.4 > Sepal.Length 0.8901688 -0.36082989 0.27565767 0.03760602 > Sepal.Width -0.4601427 -0.88271627 -0.09361987 -0.01777631 > Petal.Length 0.9915552 -0.02341519 -0.05444699 -0.11534978 > Petal.Width 0.9649790 -0.06399985 -0.24298265 0.07535950
This was also done for the "individuals." Here is the output:
head(ind$coord) # coordinates of individuals
> Dim.1 Dim.2 Dim.3 Dim.4 > 1 -2.257141 -0.4784238 0.12727962 0.024087508 > 2 -2.074013 0.6718827 0.23382552 0.102662845 > 3 -2.356335 0.3407664 -0.04405390 0.028282305 4 -2.291707 0.5953999 -0.09098530 -0.065735340 5 -2.381863 -0.6446757 -0.01568565 -0.035802870 6 -2.068701 -1.4842053 -0.02687825 0.006586116
Since the PCA was generated with scale=T
, I'm highly confused as to why the individual coordinates are not scaled (-1 to 1?). For instance, "individual 1" has a DIM-1 score of -2.257141, but I have no comparative basis for the variable coordinates which range from -0.46 to 0.991. How can a score of -2.25 be interpreted with a scaled PCA range of -1 to 1?
Am I missing something? Thanks for your time!
> data(iris)
> res.pca <- prcomp(iris[, -5], scale = TRUE)
> ind <- get_pca_ind(res.pca)
> print(ind)
>var <- get_pca_var(res.pca)
> print(var)
I asked the author of FactoExtra this question. Here was his reply:
Scale = TRUE will normalize the variables to make them comparable. This is particularly recommended when variables are measured in different scales (e.g: kilograms, kilometers, centimeters, …);(http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/).
In this case, the correlation between a variable and a principal component (PC) is used as the coordinates of the variable on the PC. The representation of variables differs from the plot of the observations: The observations are represented by their projections, but the variables are represented by their correlations.
So, the coordinates of individuals are not expected to be between -1 and 1, even if scale = TRUE.
It’s only possible to interpret the relative position of individuals and variables by creating a biplot as described at: http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/.
A biplot isn't idea for me, but I have tried rescale and it works. Also, I suppose I could take an individual and project them into the PCA to see where they fall.
Anyways, that's the end of that. Thanks for your help @Hack-r!