Search code examples
pythonpython-3.xregexregex-lookaroundsregex-group

How do I exclude a pattern with a lookbehind that is in front of another pattern?


How do I not capture or detect matches if the regex pattern precedes this regex pattern r"(?<=\s)|^)dont\s*"

This is the pattern that you want to use to exclude matches. It correctly uses a lookbehind "(?<=\s|^)dont" to check for a space or the start of the string before the word "dont". This ensures that the word "dont" is not preceded by any characters other than spaces or the start of the string.

Basically, what I am looking to achieve is that if there is a "dont" before the original pattern that has a space "\s" or the beginning of the string "^", then it does not detect the match and therefore does not capture the capture group either.

import re

#example 1 with capture, because it does not match this part of the pattern (?<=\s)|^)
#input_text = "I think Idont like a lot red apples" 
#example 2 not capture
input_text = "I think I dont like a lot red apples"

interests_match = re.search(r"(?:like\s*a\s*lot\s+(.+?)", input_text, flags = re.IGNORECASE)

if interests_match: print(interests_match.group(1))

The correct output for each example:

"red apples" #example 1
None #example 2

Solution

  • This should do what you want.

    r"(?:(?:^|\s)dont.*)|(?:like\s*a\s*lot\s+)(.+)"
    

    The pattern on the left side of the second | will skip the rest of the line if it has ^dont or \sdont in it, so that the (.+) will not capture anything.

    Note: You will need to check that the group 1 match exists so that you don't get an error.