I have an array of numbers:
>> A = [2 2 2 2 1 3 4 4];
And I want to find the array indices where each number can be found:
>> B = arrayfun(@(x) {find(A==x)}, 1:4);
In other words, this B
should tell me:
>> for ii=1:4, fprintf('Item %d in location %s\n',ii,num2str(B{ii})); end
Item 1 in location 5
Item 2 in location 1 2 3 4
Item 3 in location 6
Item 4 in location 7 8
It's like the 2nd output argument of unique
, but instead of the first (or last) occurrence, I want all the occurrences. I think this is called a reverse lookup (where the original key is the array index), but please correct me if I'm wrong.
What I have above gives the correct answer, but it scales terribly with the number of unique values. For a real problem (where A
has 10M elements with 100k unique values), even this stupid for loop is 100x faster:
>> B = cell(max(A),1);
>> for ii=1:numel(A), B{A(ii)}(end+1)=ii; end
But I feel like this can't possibly be the best way to do it.
We can assume that A
contains only integers from 1 to the max (because if it doesn't, I can always pass it through unique
to make it so).
That's a simple task for accumarray
:
out = accumarray(A(:),(1:numel(A)).',[],@(x) {x}) %'
out{1} = 5
out{2} = 3 4 2 1
out{3} = 6
out{4} = 8 7
However accumarray
suffers from not being stable (in the sense of unique
's feature), so you might want to have a look here for a stable version of accumarray, if that's a problem.
Above solution also assumes A
to be filled with integers, preferably with no gaps in between. If that is not the case, there is no way around a call of unique
in advance:
A = [2.1 2.1 2.1 2.1 1.1 3.1 4.1 4.1];
[~,~,subs] = unique(A)
out = accumarray(subs(:),(1:numel(A)).',[],@(x) {x})
To sum up, the most generic solution, working with floats and returning a sorted output could be:
[~,~,subs] = unique(A)
[subs(:,end:-1:1), I] = sortrows(subs(:,end:-1:1)); %// optional
vals = 1:numel(A);
vals = vals(I); %// optional
out = accumarray(subs, vals , [],@(x) {x});
out{1} = 5
out{2} = 1 2 3 4
out{3} = 6
out{4} = 7 8
function [t] = bench()
%// data
a = rand(100);
b = repmat(a,100);
A = b(randperm(10000));
%// functions to compare
fcns = {
@() thewaywewalk(A(:).');
@() cst(A(:).');
};
% timeit
t = zeros(2,1);
for ii = 1:100;
t = t + cellfun(@timeit, fcns);
end
format long
end
function out = thewaywewalk(A)
[~,~,subs] = unique(A);
[subs(:,end:-1:1), I] = sortrows(subs(:,end:-1:1));
idx = 1:numel(A);
out = accumarray(subs, idx(I), [],@(x) {x});
end
function out = cst(A)
[B, IX] = sort(A);
out = mat2cell(IX, 1, diff(find(diff([-Inf,B,Inf])~=0)));
end
0.444075509687511 %// thewaywewalk
0.221888202987325 %// CST-Link
Surprisingly the version with stable accumarray
is faster than the unstable one, due to the fact that Matlab prefers sorted arrays to work on.