I have a sparse matrix A(m,n)
, where n
are the variables and m
the observations. I want to compute the Pearson correlation among all the n variables.
I have some missing observations, e.g. if A(2,3)
is not available it means that I don't have such observation, so if, for example, I need to compute the correlation between column 3 and column 4, I must discard row 2, even if A(2,4)
is available. This is how Pearson correlation is normally computed.
In MATLAB, instead, the function corrcoef()
consider all the values, including the missing ones (which are considered zeros). Is there a simple way to avoid this? A very similar question is available here Pearson Correlation without using zero element in Matlab but a working solution is provided just for the comparison among two vectors and not for a generic matrix A(m,n)
where n > 2
.
Jonas's answer on the question you linked works for you as well if you generalise it:
Col1 = 2;
Col2 = 3;
A=magic(3); A(1,1)=0;
gooddata = A(:,Col1)~=0 & A(:,Col2)~=0;
pearson = corr(A(gooddata,Col1),A(gooddata,Col2));
Thus, looped it would be:
for ii = 1:length(A(1,:))
for jj = ii:length(A(1,:))
gooddata = A(:,ii)~=0 & A(:,jj)~=0;
pearson(ii,jj) = corr(A(gooddata,ii),A(gooddata,jj));
end
end