Search code examples
arraysalgorithmmatlabcell-arraymatrix-indexing

How to remove cell elements that are a subset of a larger element? (Matlab)


For example, if I have a cell with arrays listed inside:

C = {[1,2,3,4], [3,4], [2], [4,5,6], [4,5], [7]}

I want to output:

D = {[1,2,3,4], [4,5,6], [7]}

What is the most efficient way to remove cell elements that are already included/subset in another larger element?

My existing algorithm loops through each element, compares it to each element in the new cell list and updates the new cell list accordingly but it is extremely slow and inefficient (my original cell contains >200 array elements).


Solution

  • Here is an approach using ,very fast, matrix multiplication. You can convert C to a ,sparse, binary matrix b that each element of C is related to each row of the matrix. So columns [1 2 3 4] of the first row and columns [3 4] of the second row are set to 1 and so on. Multiplying the matrix by its transpose we can find the number of corresponding elements between all pairs of rows. using that information we can find cell elements that are subset of others.

    C = {[1,2,3,4], [3,4], [2], [4,5,6], [4,5], [7]};
    n = cellfun(@numel,C);      % find length of each element.
    v = repelem(1:numel(C),n);  % generate indices for rows of the binary matrix
    [~,~,u] = unique([C{:}]);   % generate indices for rows of the binary matrix
    b = accumarray([v(:),u(:)],ones(size(v)),[],@max,[],true); % generate the binary matrix
    s = b * b.';                % multiply by its transpose
    s(1:size(s,1)+1:end) = 0;   % set diagonal elements to 0(we do not need self similarity)
    result = C(max(s)<n)        % remove an element if it is a subset and preserve others