Search code examples
regexstringmatlabsubstring

How to use regexp in MATLAB to strictly match a substring and not a larger string containing that substring?


I would like to find whether a cell contains the substring foo and only this string (nothing before, nothing after) in a series of cells that may contain foobar.

I am currently using regexp in MATLAB and would like to tweak the searched pattern regexp to exclude cells that contain a string that contains the substring I defined.

I know it kind of goes against the very idea of regexp, but I am fairly certain there is a way to do what I want.

As a MWE, here is a snippet of the data I have (in cell format), called potentialfields:

'horaracha'
'sol'
'presmax'
'horapresmax'
'presmin'
'horapresmin'

and the regexp expression that I am currently using:

selected_fields={'sol','presmin'};
diffset=setdiff(potentialfields,selected_fields);
pattern=strjoin(diffset,'|');
idx_to_delete=~cellfun(@isempty,regexp(potentialfields,pattern));

The expected output of idx_to_delete is the following:

1 0 1 1 0 1

At the moment, the output is 1 0 1 1 1 1 because horapresmin contains presmin.

Thank you very much in advance.


Solution

  • regexp is overkill here, ismember is an in-built function specifically designed for finding exact strings in a cell

    idx_to_delete = ismember( potentialfields, selected_fields );
    

    If you're really set on regexp you can use the start anchor (^) and end anchor ($) like so:

    pattern = ['^(', strjoin( selected_fields, '|' ), ')$'];
    idx_to_delete2 = ~cellfun( @isempty, regexp( potentialfields, pattern ) );