Search code examples
matlabvectorizationmatrix-indexing

Summing array by indices from another cell array


I have an array:

a = [109, 894, 566, 453, 342, 25]

and another cell array of sub-indices of a, denoted as:

subs = { [1,3,4], [2,5,6], [1,3], [3,4], [2,3,4], [6] };    

I want to avoid the for-loop to calculate the following summations via MATLAB:

for i=1:6
    sums_a(i) = sum(a(subs{i}));
end

Is there any fast way such as arrayfun to implement this? Thanks.


Solution

  • If you're looking for speed, arrayfun can be rather slow. As commented by Andrew Horchler, in latest releases of MATLAB for loops can be pretty fast thanks to JIT acceleration. If you still insist on avoiding the loop, here's a tricky solution without for loops that employs accumarray:

    idx = cumsum(cellfun('length', subs));
    x = diff(bsxfun(@ge, [0; idx(:)], 1:max(idx)));
    x = sum(bsxfun(@times, x', 1:numel(subs)), 2);  %'// Produce subscripts
    y = a([subs{:}]);                               % // Obtain values
    sums_a = accumarray(x, y);                      % // Accumulate values
    

    This can be written as a (rather long) one-liner actually, but it's split into several lines for clarity.

    Explanation

    This values to be accumulated are obtained like so:

    y = a([subs{:}]);
    

    In your example their corresponding indices should be:

    1    1    1    2    2    2    3    3    4    4    5    5    5    6
    

    That is:

    1. The first 3 values from y are accumulated and the result is stored as the first element in the output.
    2. The next 3 values are accumulated and the result is stored as the second element in the output.
    3. The next 2 values are accumulated and the result is stored as the third element in the output.

    and so on...

    The following lines do the magic to produce such a vector of indices x:

    idx = cumsum(cellfun('length', subs));
    x = diff(bsxfun(@ge, [0; idx(:)], 1:max(idx)));
    x = sum(bsxfun(@times, x', 1:numel(subs)), 2);
    

    Lastly, x and y are fed into accumarray:

    sums_a = accumarray(x, y);
    

    and voila.

    Benchmark

    Here's the benchmarking code:

    a = [109,894,566,453,342,25];
    subs = {[1,3,4], [2,5,6], [1,3], [3,4], [2,3,4], 6};
    
    % // Solution with arrayfun
    tic
    for k = 1:1000
        clear sums_a1
        sums_a1 = cellfun( @(subs) sum( a(subs) ), subs );
    end
    toc
    
    % // Solution with accumarray
    tic
    for k = 1:1000
        clear sums_a2
        idx = cumsum(cellfun('length', subs));
        x = diff(bsxfun(@ge, [0; idx(:)], 1:max(idx)));
        x = sum(bsxfun(@times, x', 1:numel(subs)), 2);
        sums_a2 = accumarray(x, a([subs{:}]));
    end
    toc
    
    %'// Solution with for loop
    tic
    for k = 1:1000
        clear sums_a3
        for n = 1:6
            sums_a3(n) = sum(a(subs{n}));
        end
    end
    toc
    

    The results on my machine are:

    Elapsed time is 0.027746 seconds.
    Elapsed time is 0.004228 seconds.
    Elapsed time is 0.001959 seconds.
    

    There's an almost tenfold speedup for accumarray vs. arrayfun, but notice that the for loop still beats both.