Search code examples
performancematlabbooleancomparison-operators

A faster alternative to all(a(:,i)==a,1) in MATLAB


It is a straightforward question: Is there a faster alternative to all(a(:,i)==a,1) in MATLAB?

I'm thinking of a implementation that benefits from short-circuit evaluations in the whole process. I mean, all() definitely benefits from short-circuit evaluations but a(:,i)==a doesn't.

I tried the following code,

% example for the input matrix

m = 3;       % m and n aren't necessarily equal to those values.
n = 5000;    % It's only possible to know in advance that 'm' << 'n'.

a = randi([0,5],m,n); % the maximum value of 'a' isn't necessarily equal to 
                      % 5 but it's possible to state that every element in 
                      % 'a' is a positive integer.

% all, equal solution

tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
    %%%%%%%%%% all and equal solution %%%%%%%%%
    ax_boo = all(a(:,i)==a,1);
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc

% alternative solution

tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
    %%%%%%%%%%% alternative solution %%%%%%%%%%%
    ax_boo = a(1,i) == a(1,:);
    for k = 2:m
        ax_boo(ax_boo) = a(k,i) == a(k,ax_boo);
    end
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc

but it's intuitive that any "for-loop-solution" within the MATLAB environment will be naturally slower. I'm wondering if there is a MATLAB built-in function written in a faster language.

EDIT:

After running more tests I found out that the implicit expansion does have a performance impact in evaluating a(:,i)==a. If the matrix a has more than one row, all(repmat(a(:,i),[1,n])==a,1) may be faster than all(a(:,i)==a,1) depending on the number of columns (n). For n=5000 repmat explicit expansion has proved to be faster.

But I think that a generalization of Kenneth Boyd's answer is the "ultimate solution" if all elements of a are positive integers. Instead of dealing with a (m x n matrix) in its original form, I will store and deal with adec (1 x n matrix):

exps = ((0):(m-1)).';
base = max(a,[],[1,2]) + 1;
adec = sum( a .* base.^exps , 1 );

In other words, each column will be encoded to one integer. And of course adec(i)==adec is faster than all(a(:,i)==a,1).

EDIT 2:

I forgot to mention that adec approach has a functional limitation. At best, storing adec as uint64, the following inequality must hold base^m < 2^64 + 1.


Solution

  • Since your goal is to count the number of columns that match, my example converts the binary encoding to integer decimals, then you just loop over the possible values (with 3 rows that are 8 possible values) and count the number of matches.

    a_dec = 2.^(0:(m-1)) * a;
    num_poss_values = 2 ^ m;
    num_matches = zeros(num_poss_values, 1);
    for i = 1:num_poss_values
       num_matches(i) = sum(a_dec == (i - 1));
    end
    

    On my computer, using 2020a, Here are the execution times for your first 2 options and the code above:

    Elapsed time is 0.246623 seconds.
    Elapsed time is 0.553173 seconds.
    Elapsed time is 0.000289 seconds.
    

    So my code is 853 times faster!

    I wrote my code so it will work with m being an arbitrary integer.

    The num_matches variable contains the number of columns that add up to 0, 1, 2, ...7 when converted to a decimal.