Search code examples
javascriptpythonregexregex-lookarounds

regex matches string despite negative lookahead


I want to match the first 2 words in a string, except when the second one is "feat", then I just want to match the first word.

My plan: (\w+(?: \w+))(?!feat) does not work. "feat" gets matched everytime. I tried variations of the same, but to no avail.

Here's an example string: "Technotronic feat Ya Kid K"

Thank you for your help!

Edit:

this is the string where it flips: "Technotronic feat Ya Kid K"

this is the code that should cut the string:

pattern = re.compile("^\w+(?: (?!feat\b)\w+)?")

def cut(string):
    str = pattern.search(string).group(0)

    return str

Solution

  • You can use

    \w+(?: (?!feat\b)\w+)?
    \w+(?:\s+(?!feat\b)\w+)?
    

    See the regex demo.

    The point is that you need to restrict what the second \w+ matches right before the \w+ (as lookaheads match the text immediately after the current position), and to allow matching words starting with feat, you need to use a word boundary after feat in the lookahead.

    Regex details:

    • \w+ - one or more word chars
    • (?:\s+(?!feat\b)\w+)? - an optional non-capturing group:
      • \s+ - zero or more whitespaces
      • (?!feat\b) - immediately to the right, there cannot be a whole word feat (so, the subsequent \w+ won't match feat but will match feature)
    • \w+ - one or more word chars.

    See the Python demo:

    import re
    pattern = re.compile(r"^\w+(?: (?!feat\b)\w+)?")
    
    def cut(text):
        m = pattern.search(text)
        if m:
            return m.group(0)
        return string
    
    print(cut("Technotronic feat Ya Kid K"))    # => Technotronic
    print(cut("Technotronic feature Ya Kid K")) # => Technotronic feature