I have a process that is iteratively and randomly pruning a huge vector of integers and I want to find what elements are removed between each iteration. This vector has a lot of repetitions and using ismember() and setdiff() doesn't helped me much.
As an illustration if X = [1,10,8,5,10,3,5,2]:
step 0: X = 1,10,8,5,10,3,5,2
step 1: X = 1,10,8,10,3,5,2 (5 is removed)
step 2: X = 1,10,8,3,2 (10 and 5 are removed)
step 3: X = 10,8,3,2 (1 is removed)
step 4: X = 2 (10, 8 and 3 are removed)
step 5: X = [] (2 is finally removed)
I aim at finding the elements removed at each steps (ie. 5 then, 10 and 5 and so on). I could possibly find an overly complicated solution using hist(X, unique(X))
between steps, but I assume there exists a much more elegant (and cheaper!) solution in matlab.
This approach is memory-intensive. It computes an intermediate matrix of size N
xM
where N
is the number of elements of X
and M
is the number of unique elements of X
, using implicit expansion. This may be feasible or not depending on your typical N
and M
.
X = [1,10,8,5,10,3,5,2];
Y = [8,10,2,1]; % removed 10, 5, 5, 3. Order in Y is arbitrary
u = unique(X(:).');
removed = repelem(u, sum(X(:)==u,1)-sum(Y(:)==u,1));
gives
removed =
3 5 5 10
For Matlab versions before R2016b, you need bsxfun
instead of implicit expansion:
removed = repelem(u, sum(bsxfun(@eq,X(:),u),1)-sum(bsxfun(@eq,Y(:),u),1));
If the values in X
are always positive integers, a more efficient approach can be used, employing sparse
to compute the number of times each element appears:
X = [1,10,8,5,10,3,5,2];
Y = [8,10,2,1]; % removed 10, 5, 5, 3. Order in Y is arbitrary
removed = repelem(1:max(X), sparse(1,X,1) - sparse(1,Y,1));