Search code examples
matlabcell-array

Finding string in cell array of cell arrays


Using Matlab, say that we have a cell array of cell arrays. For example:

C = { {'hello' 'there' 'friend'}, {'do' 'say' 'hello'}, {'or' 'maybe' 'not'} }

I would like to find the index of all of the cell arrays in C that contain the string 'hello'. In this case, I would expect 1 and 2, because the 1st cell array has 'hello' in the first slot and the 2nd cell array has it in the third slot.

This would be quite a bit easier I imagine using a matrix (a simple find) but for educational purposes, I'd like to learn the process using a cell array of cell arrays as well.

Many thanks in advance.


Solution

  • Straight-forward Approaches

    With arrayfun -

    out = find(arrayfun(@(n) any(strcmp(C{n},'hello')),1:numel(C)))
    

    With cellfun -

    out = find(cellfun(@(x) any(strcmp(x,'hello')),C))
    

    Alternative Approach

    You can adopt a new approach that translates the input of cell array of cell arrays of strings to cell array of strings, thus reducing one level "cell hierarchy". Then, it performs strcmp and thus avoids cellfun or arrayfun, which might make it faster than earlier listed approaches. Please note that this approach would make more sense from performance point of view, if the number of cells in each cell of the input cell array don't vary a lot, since that translation leads to a 2D cell array with empty cells filling up empty places.

    Here's the implementation -

    %// Convert cell array of cell ararys to a cell array of strings, i.e.
    %// remove one level of "cell hierarchy"
    lens = cellfun('length',C)
    max_lens = max(lens) 
    C1 = cell(max_lens,numel(C))
    C1(bsxfun(@le,[1:max_lens]',lens)) = [C{:}]  %//'
    
    %// Use strsmp without cellfun and this might speed it up
    out = find(any(strcmp(C1,'hello'),1))
    

    Explanation:

    [1] Convert cell array of cell arrays of strings to cell array of strings:

    C = { {'hello' 'there' 'friend'}, {'do' 'hello'}, {'or' 'maybe' 'not'} }
    

    gets converted to

    C1 = {
        'hello'     'do'       'or'   
        'there'     'hello'    'maybe'
        'friend'         []    'not'  }
    

    [2] For each column find if there's any string hello and find those column IDs as the final output.