Search code examples
pythonregexrotten-tomatoes

Regular expression on rotten tomatoes URL -- exclude stem


I want to return a match on a TV series url:

YES: http://www.rottentomatoes.com/tv/falling-skies/

But not on a TV episode or TV season

NO: http://www.rottentomatoes.com/tv/falling-skies/s03
NO: http://www.rottentomatoes.com/tv/falling-skies/s12/e01

I currently have the following regex:

match = re.match('(http(s)?://)?(www.)?rottentomatoes.com/tv/.+', url)

This matches all three of the above. How would I construct the regex to only match the first one?


Solution

  • Use a negated character class instead of .+:

    ^http://www\.rottentomatoes\.com/tv/[^/]+/?$
    

    [^/]+ matches any character that is not a slash, one or more times — which is everything from tv/ until the next slash (or the end of the string if a / is not present).

    RegEx Demo