Search code examples
ranalyticspcaspss

When doing PCA (Principal Component Analysis), how 'similar / compatible' has to be the data?


I am trying to understand the compatibility of the data when running PCA on SPSS or R. The data-set I have is about information regards wine from Portugal and I know that some of the attributes are not comparable, like pH, alcohol and quality ranking for example.

enter image description here

If I normalize this data on R, would it be compatible for PCA? What I am trying to achieve is understanding which are the attributes that make quality higher (has to be PCA, though). I am sorry if this question is stupid, I'm a student of data analytics and due to this corona virus situation, classes are not being delivered and I still have to deliver a CA which I have not idea how to start. Thank you!


Solution

  • In SPSS FACTOR, you can do PCA on the correlation or the covariance matrix. If you use the covariance matrix, then variables with larger ranges will tend to dominate the solution. If you use the correlation matrix (the default), then each variable will be normalized to the same variance, resulting in a different solution where variables with larger original scales will not necessarily dominate the solution.