matlab performance regression vectorization covariance

Vectorize a regression map calculation

I compute the regression map of a time series A(t) on a field B(x,y,t) in the following way:

A=1:10; %time
B=rand(100,100,10); %x,y,time

rc=nan(size(B,1),size(B,2));
for ii=size(B,1)
  for jj=1:size(B,2)
     tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
     rc(ii,jj) = tmp(1,2); %covariance A and B
  end
end
rc = rc/var(A); %regression coefficient

Is there a way to vectorize/speed up code? Or maybe some built-in function that I did not know of to achieve the same result?

Solution

In order to vectorize this algorithm, you would have to "get your hands dirty" and compute the covariance yourself. If you take a look inside cov you'll see that it has many lines of input checking and very few lines of actual computation, to summarize the critical steps:

y = varargin{1};
x = x(:);
y = y(:);
x = [x y];
[m,~] = size(x);
denom = m - 1;
xc = x - sum(x,1)./m;  % Remove mean
c = (xc' * xc) ./ denom;

To simplify the above somewhat:

x = [x(:) y(:)];
m = size(x,1);
xc = x - sum(x,1)./m;
c = (xc' * xc) ./ (m - 1);

Now this is something that is fairly straightforward to vectorize...

function q51466884
A = 1:10; %time
B = rand(200,200,10); %x,y,time
%% Test Equivalence:
assert( norm(sol1-sol2) < 1E-10);
%% Benchmark:
disp([timeit(@sol1), timeit(@sol2)]);

%%
function rc = sol1()
rc=nan(size(B,1),size(B,2));
for ii=1:size(B,1)
  for jj=1:size(B,2)
     tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
     rc(ii,jj) = tmp(1,2); %covariance A and B
  end
end
rc = rc/var(A); %regression coefficient
end

function rC = sol2()  
m = numel(A);
rB = reshape(B,[],10).'; % reshape
% Center:
cA = A(:) - sum(A)./m;
cB = rB - sum(rB,1)./m;
% Multiply:
rC = reshape( (cA.' * cB) ./ (m-1), size(B(:,:,1)) ) ./ var(A);
end

end

I get these timings: [0.5381 0.0025] which means we saved two orders of magnitude in the runtime :)

Note that a big part of optimizing the algorithm is assuming you don't have any "strangeness" in your data, like NaN values etc. Take a look inside cov.m to see all the checks that we skipped.