Search code examples
rstatisticspcaeigenvectorprincomp

How to use princomp () function in R when covariance matrix has zero's?


While using princomp() function in R, the following error is encountered : "covariance matrix is not non-negative definite".

I think, this is due to some values being zero (actually close to zero, but becomes zero during rounding) in the covariance matrix.

Is there a work around to proceed with PCA when covariance matrix contains zeros ?

[FYI : obtaining the covariance matrix is an intermediate step within the princomp() call. Data file to reproduce this error can be downloaded from here - http://tinyurl.com/6rtxrc3]


Solution

  • The first strategy might be to decrease the tolerance argument. Looks to me that princomp won't pass on a tolerance argument but that prcomp does accept a 'tol' argument. If not effective, this should identify vectors which have nearly-zero covariance:

    nr0=0.001
    which(abs(cov(M)) < nr0, arr.ind=TRUE)
    

    And this would identify vectors with negative eigenvalues:

    which(eigen(M)$values < 0)
    

    Using the h9 example on the help(qr) page:

    > which(abs(cov(h9)) < .001, arr.ind=TRUE)
          row col
     [1,]   9   4
     [2,]   8   5
     [3,]   9   5
     [4,]   7   6
     [5,]   8   6
     [6,]   9   6
     [7,]   6   7
     [8,]   7   7
     [9,]   8   7
    [10,]   9   7
    [11,]   5   8
    [12,]   6   8
    [13,]   7   8
    [14,]   8   8
    [15,]   9   8
    [16,]   4   9
    [17,]   5   9
    [18,]   6   9
    [19,]   7   9
    [20,]   8   9
    [21,]   9   9
    > qr(h9[-9,-9])$rank  
    [1] 7                  # rank deficient, at least at the default tolerance
    > qr(h9[-(8:9),-(8:9)])$ take out only the vector  with the most dependencies
    [1] 6                   #Still rank deficient
    > qr(h9[-(7:9),-(7:9)])$rank
    [1] 6
    

    Another approach might be to use the alias function:

    alias( lm( rnorm(NROW(dfrm)) ~ dfrm) )