Search code examples
matlabperformanceregressionvectorizationcovariance

Vectorize a regression map calculation


I compute the regression map of a time series A(t) on a field B(x,y,t) in the following way:

A=1:10; %time
B=rand(100,100,10); %x,y,time

rc=nan(size(B,1),size(B,2));
for ii=size(B,1)
  for jj=1:size(B,2)
     tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
     rc(ii,jj) = tmp(1,2); %covariance A and B
  end
end
rc = rc/var(A); %regression coefficient

Is there a way to vectorize/speed up code? Or maybe some built-in function that I did not know of to achieve the same result?


Solution

  • In order to vectorize this algorithm, you would have to "get your hands dirty" and compute the covariance yourself. If you take a look inside cov you'll see that it has many lines of input checking and very few lines of actual computation, to summarize the critical steps:

    y = varargin{1};
    x = x(:);
    y = y(:);
    x = [x y];
    [m,~] = size(x);
    denom = m - 1;
    xc = x - sum(x,1)./m;  % Remove mean
    c = (xc' * xc) ./ denom;
    

    To simplify the above somewhat:

    x = [x(:) y(:)];
    m = size(x,1);
    xc = x - sum(x,1)./m;
    c = (xc' * xc) ./ (m - 1);
    

    Now this is something that is fairly straightforward to vectorize...

    function q51466884
    A = 1:10; %time
    B = rand(200,200,10); %x,y,time
    %% Test Equivalence:
    assert( norm(sol1-sol2) < 1E-10);
    %% Benchmark:
    disp([timeit(@sol1), timeit(@sol2)]);
    
    %%
    function rc = sol1()
    rc=nan(size(B,1),size(B,2));
    for ii=1:size(B,1)
      for jj=1:size(B,2)
         tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
         rc(ii,jj) = tmp(1,2); %covariance A and B
      end
    end
    rc = rc/var(A); %regression coefficient
    end
    
    function rC = sol2()  
    m = numel(A);
    rB = reshape(B,[],10).'; % reshape
    % Center:
    cA = A(:) - sum(A)./m;
    cB = rB - sum(rB,1)./m;
    % Multiply:
    rC = reshape( (cA.' * cB) ./ (m-1), size(B(:,:,1)) ) ./ var(A);
    end
    
    end
    

    I get these timings: [0.5381 0.0025] which means we saved two orders of magnitude in the runtime :)

    Note that a big part of optimizing the algorithm is assuming you don't have any "strangeness" in your data, like NaN values etc. Take a look inside cov.m to see all the checks that we skipped.