Search code examples
regexmatlaboctaveoverlapping

How to use regexp (regular expression) in Matlab/Octave to find overlapping matches


Say I want to use Matlab's or Octave's regexp function to find where the substring 'var' occurs when preceded by either , or : and also followed by either , or : (comma or colon). For example, say

line = ':var,var:'

In this case I want the answer to be [2 6], because 'var' starts at positions 2 and 6.

However, if I do

>> regexp(line, '[,:]var[,:]') + 1
   ans = 2

I only get the first position, 2, but not the second position, 6. This is because Matlab considers the comma part of the first occurrence, so it is discarded and not used for the second.

How can I make regexp consider overlapping matches and return [2 6]?


Solution

  • Use lookarounds:

    (?<=[,:])var(?=[,:])
    

    See proof

    EXPLANATION

                             EXPLANATION
    --------------------------------------------------------------------------------
      (?<=                     look behind to see if there is:
    --------------------------------------------------------------------------------
        [,:]                     any character of: ',', ':'
    --------------------------------------------------------------------------------
      )                        end of look-behind
    --------------------------------------------------------------------------------
      var                      'var'
    --------------------------------------------------------------------------------
      (?=                      look ahead to see if there is:
    --------------------------------------------------------------------------------
        [,:]                     any character of: ',', ':'
    --------------------------------------------------------------------------------
      )                        end of look-ahead