Search code examples
stringmatlabfindcharactercell-array

Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)


I have one string and a cell array of strings.

str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aac'};

I want to obtain:

idx = [2, 3, 6, 8];

I have written a very long code that:

  1. finds the elements with length not greater than length(str);
  2. removes the elements with characters not included in str;
  3. finally, for each remaining element, checks the characters one by one

Essentially, it's an almost brute force code and runs very slowly. I wonder if there is a simple way to do it fast.

NB: I have just edited the question to make clear that characters can be repeated n times if they appear n times in str. Thanks Shai for pointing it out.


Solution

  • You can sort the strings and then match them using regular expression. For your example the pattern will be ^a{0,2}c{0,1}t{0,1}z{0,1}$:

    u = unique(str);
    t = ['^' sprintf('%c{0,%d}', [u; histc(str,u)]) '$']; 
    s = cellfun(@sort, dic, 'uni', 0);
    idx = find(~cellfun('isempty', regexp(s, t)));