Search code examples
pythonregexoverlapping-matches

How to get this regex include overlapping occurrences, but not too much:-?


I want to get all the occurrences of the pattern '[number]' including their context but I can't.

Here is my code:

import re
text = 'some crap [00][0] some more'
regex = r'\[[0-9]*\]'
regex = '.{0,10}' + regex + '.{0,10}'
occurrences = re.findall(regex, text)
for occ in occurrences print(occ)

What is actually wrong!?

My code works just as I wish in any case except for when there are two [number] blocks with less than 10 characters in between. where my code gives me one result while I'm looking for two. If I set the regex to include the overlapping occurrences then it will give all the results for different context lengths. I can't set the context length specifically because I want to include the occurrences at the beginning and end of the string.

What I actually want:

I prefer a pure regex solution to get me all the occurrences of the mentioned pattern including their context.

If really impossible I'd do fine with a solution that uses the positions and selects a range from the string.


Solution

  • Read about non-capturing group and negative lookahead.

    To fix your issue just change the forth line to:

    regex = '(?:(?!' + regex + ').){0,10}' + regex + '(?:(?!' + regex + ').){0,10}'