Search code examples
matlabvectorcellcell-array

How to find the vectors of the cell A that contain at least one element of the vector B?


How to find the vectors of A that contain at least one element of the vector B ?

example:

A = {[2 5],[8 9 2],[33 77 4],[102 6],[10 66 17 7 8 11],[110 99],[1 4 3],[15 41 88]}

B = [5 77 41 66 7]

Result = {[2 5],[33 77 4],[10 66 17 7 8 11],[15 41 88]}

Solution

  • Approaches

    With arrayfun and ismember -

    Result = A(arrayfun(@(n) any(ismember(B,A{n})),1:numel(A)))
    

    Or with arrayfun and bsxfun -

    Result = A(arrayfun(@(n) any(any(bsxfun(@eq,B(:),A{n}),2)),1:numel(A)))
    

    Or with arrayfun and setdiff -

    Result = A(arrayfun(@(n) numel(setdiff(B,A{n})) < numel(B),1:numel(A)))
    

    Or with arrayfun and intersect -

    Result = A(arrayfun(@(n) ~isempty(intersect(B,A{n})),1:numel(A)))
    

    One could also use cellfun here, such that the four counterpart cellfun based solutions end up like these -

    Result = A(cellfun(@(x) any(ismember(B,x)), A))
    
    Result = A(cellfun(@(x) any(any(bsxfun(@eq,B(:),x),2)),A))
    
    Result = A(cellfun(@(x) numel(setdiff(B,x)) < numel(B),A))
    
    Result = A(cellfun(@(x) ~isempty(intersect(B,x)),A))
    

    Taking different route [Using bsxfun's masking capability]

    Rather than going into those arrayfun or cellfun based approaches that are essentially loopy approaches, one can make the solution alot vectorized by converting A into a 2D numeric array. So, the idea here is to have a 2D array in which number of rows is the maximum number of elements in A and number of columns as number of cells in A. Each column of this array would hold elements from each cell of A and NaNs would fill up the empty spaces.

    The solution code with such an approach would look like this -

    lens = cellfun('length',A); %// number of elements in each cell of A
    mask = bsxfun(@ge,lens,(1:max(lens))'); %//'# mask of valid places in the 2D array
    A_arr = NaN(size(mask)); %//initialize 2D array in which A elements are to be put
    A_arr(mask) = [A{:}]; %// put the elements from A
    
    %// Find if any element from B is in any element along the row or dim-3       
    %// locations in A_arr. Then logically index into A with it for the final
    %// cell array output
    Result = A(any(any(bsxfun(@eq,A_arr,permute(B,[1 3 2])),1),3));
    

    Verification

    >> celldisp(Result)
    Result{1} =
         2     5
    Result{2} =
        33    77     4
    Result{3} =
        10    66    17     7     8    11
    Result{4} =
        15    41    88
    

    Benchmarking

    For people interested in seeing the runtime performances, here's a quick benchmarking test with a sufficiently huge datasize -

    %// Create inputs
    N = 10000; %// datasize
    max_num_ele = 100; %// max elements in any cell of A
    num_ele = randi(max_num_ele,N,1); %// number of elements in each cell of A
    A = arrayfun(@(n) randperm(N,num_ele(n)), 1:N, 'uni', 0); 
    B = randperm(N,num_ele(1));
    
    %// Warm up tic/toc.
    for k = 1:100000
        tic(); elapsed = toc();
    end
    
    %// Start timing all approaches
    disp('************************  With arrayfun **************************')
    disp('------------------------  With arrayfun + ismember')
    tic
    Result = A(arrayfun(@(n) any(ismember(B,A{n})),1:numel(A)));
    toc, clear Result
    
    disp('------------------------  With arrayfun + bsxfun')
    tic
    Result = A(arrayfun(@(n) any(any(bsxfun(@eq,B(:),A{n}),2)),1:numel(A)));
    toc, clear Result
    
    disp('------------------------  With arrayfun + setdiff')
    tic
    Result = A(arrayfun(@(n) numel(setdiff(B,A{n})) < numel(B),1:numel(A)));
    toc, clear Result
    
    disp('------------------------  With arrayfun + intersect')
    tic
    Result = A(arrayfun(@(n) ~isempty(intersect(B,A{n})),1:numel(A)));
    toc, clear Result
    
    disp('************************  With cellfun **************************')
    disp('------------------------  With cellfun + ismember')
    tic
    Result = A(cellfun(@(x)any(ismember(B,x)), A));
    toc, clear Result
    
    disp('------------------------  With cellfun + bsxfun')
    tic
    Result = A(cellfun(@(x) any(any(bsxfun(@eq,B(:),x),2)),A));
    toc, clear Result
    
    disp('------------------------  With cellfun + setdiff')
    tic
    Result = A(cellfun(@(x) numel(setdiff(B,x)) < numel(B),A));
    toc, clear Result
    
    disp('------------------------  With cellfun + setdiff')
    tic
    Result = A(cellfun(@(x) ~isempty(intersect(B,x)),A));
    
    disp('************************  With masking bsxfun **************************')
    tic
    lens = cellfun('length',A); %// number of elements in each cell of A
    mask = bsxfun(@ge,lens,(1:max(lens))'); %//'
    A_numarr = NaN(size(mask));
    A_numarr(mask) = [A{:}];
    Result = A(any(any(bsxfun(@eq,A_numarr,permute(B,[1 3 2])),1),3));
    toc
    

    The results thus obtained on my system were -

    ************************  With arrayfun **************************
    ------------------------  With arrayfun + ismember
    Elapsed time is 0.409810 seconds.
    ------------------------  With arrayfun + bsxfun
    Elapsed time is 0.157327 seconds.
    ------------------------  With arrayfun + setdiff
    Elapsed time is 1.154602 seconds.
    ------------------------  With arrayfun + intersect
    Elapsed time is 1.081729 seconds.
    ************************  With cellfun **************************
    ------------------------  With cellfun + ismember
    Elapsed time is 0.392375 seconds.
    ------------------------  With cellfun + bsxfun
    Elapsed time is 0.143341 seconds.
    ------------------------  With cellfun + setdiff
    Elapsed time is 1.101331 seconds.
    ------------------------  With cellfun + setdiff
    ************************  With masking bsxfun ********************
    Elapsed time is 0.067224 seconds.
    

    As one can see, cellfun based solutions are wee bit faster than their arrayfun based counterparts! Also, mask-based bsxfun approach looks like an interesting one, but do keep in mind it's memory-hungry nature.