Search code examples
matlabsetcell-array

Matlab: How to remove cell elements which have other sets as subsets


I have a cell with arrays listed inside:

C = {[1,2,3,4], [3,4], [2], [4,5,6], [4,5], [7]}

I want to output:

D = {[3,4], [2], [4,5], [7]}

Those sets in D are the only sets that contain any other sets in D in themselves.

Please reference the following link for a similar question. Although elegant, I was not able to modify the code (yet) to accommodate my particular question.

I would appreciate any help with a solution.

Thank you!


Solution

  • As of the linked post you can form the matrix s that represents the number of similar elements between all pairs of sets. The result would be:

    C = {[1,2,3,4], [3,4], [2], [4,5,6], [4,5], [7]};
    n = cellfun(@numel,C);      % find length of each element.
    v = repelem(1:numel(C),n);  % generate indices for rows of the binary matrix
    [~,~,u] = unique([C{:}]);   % generate indices for rows of the binary matrix
    b = accumarray([v(:),u(:)],ones(size(v)),[],@max,[],true); % generate the binary matrix
    s = b * b.';                % multiply by its transpose
    s(1:size(s,1)+1:end) = 0;   % set diagonal elements to 0(we do not need self similarity)
    result=C(~any(n(:) == s)) ;
    

    But the matrix may be very large so it is better to use a loop to avoid memory problems:

    idx=false(1,numel(C));
    for k =1:numel(C)
        idx(k) = ~any(n == full(s(k, :))) ;
    end
    result=C(idx) ;
    

    Or follow a vectorized approach:

    [r, c, v] = find(s) ;
    idx = sub2ind(size(s), r, c) ;
    s(idx) = v.' == n(r) ;
    result = C(~any(s)) ;