Search code examples
rpca

Principal component analysis on a correlation matrix


Many functions can perform Principal Component Analysis (PCA) on raw data in R. By raw data I understand any data frame or matrix whose rows are indexed by observations and whose columns are identified with measurements. Can we carry out PCA on a correlation matrix in R ? Which function can accept a correlation matrix as its input in R ?


Solution

  • As mentioned in the comments, you can use

    ii <- as.matrix(iris[,1:4])
    princomp(covmat=cor(ii))
    

    This will give you equivalent results to princomp(iris,cor=TRUE) (which is not what you want - the latter uses the full data matrix, but returns the value computed when the covariance matrix is converted to a correlation).


    You can also do all the relevant computations by hand if you have the correlation matrix:

    cc <- cor(ii)
    e1 <- eigen(cc)
    

    Standard deviations:

    sqrt(e1$values)
    [1] 1.7083611 0.9560494 0.3830886 0.1439265
    

    Proportion of variance:

    e1$values/sum(e1$values)
    [1] 0.729624454 0.228507618 0.036689219 0.005178709
    

    You can get the loadings via e1$vectors. Compute the scores (according to this CV question) via as.matrix(iris) %*% e1$vectors) (this will not give numerically identical answers to princomp()$scores - the eigenvectors are scaled differently - but it gives equivalent results).