Search code examples
matlabmathmatlab-guidedata-analysis

How to calculate the covariance matrix in blocks?


There is a Matlab build in function "cov" to calculate the covariance matrix of a given matrix C. If C is too big, for example 1000*60000 double, and there is not enough RAM in my computer, it is necessary to write a function to calculate the covariance matrix of a given matrix C in blocks or pieces. My question is how to calculate the covariance matrix in blocks/pieces? Suppose the size of the given matrix is 1000*60000 double, which my computer cannot handle by using "cov" function.


Solution

  • Assuming that you mean you have 60,000 observation of 1,000 variables then you can just compute the covaraince matrix in chunks and then combine them as you go:

    1. Partition your observations into chunks of size N. (N will have to be determined to fit within your RAM)
    2. Compute the covariance for the N'th chunk
    3. Combine the N'th chunk with the total covariance for the previous N-1 chunks

    Here is a discussion on how you combine covariance matrices. Basically, you want to keep track of both the covariances and means of the chunks as you process them and then combine by exploiting their mean square minus square of the means representation listed in the first property of covariances listed here.