Search code examples
regexpreg-match

Regex: match lines missing word1 except they include word2


I'm looking for a regex which matches to certain URL's:

I want to match any URL except if they include the word "Koeln" OR if they contain the word "Karneval" (regardless if they contain Koeln or not).

Exaple:

1) http://www.news.com/Report-Deutschland/Panorama/Deutschland/story.html

2) http://www.news.com/Koeln/Karneval/story.html

3) http://www.news.com/Koeln/Koelnaktuell/story.html

1) and 2) should match. 1) because it doesn't include "Koeln" and 2) because it includes "Karneval" 3) should not match because it includes "Koeln" but not "Karneval"

I tried many different regex using positive/negative lookahead but none of them worked so far.

I plan to implement the regex with preg in PHP.


Solution

  • Not sure if this is the best approach here, but you can give this a shot and see if it works for you:

    (http://.*?/Karneval.*$|http://www\.news\.com(?!/Koeln).*$)
    

    I am basically just doing two expressions ... one to match Karneval and one that doesn't find /Koeln after www.news.com.

    Here's a demo you can try: Regex101 Demo

    Hopefully this works for you or at least points you in the right direction.