Search code examples
c#regexregex-lookaroundsregex-group

RegEx matching on episode titles with different resolution suffixes


I'm trying to capture the show name, episode number, episode title, and resolution if present. Standard def episodes in my collection don't have a resolution suffix.

For the given samples:

Show Name - S01E02 - This Is a High-Def Episode Title - 720p
Show Name - S01E03 - This Is a High-Def Episode Title - 1080p
Show Name - S01E04 - This Is a Standard-Def Episode Title
Show Name - S01E05E06 - This Is a High-Def Double Episode Title - 720p

This is as close as I can get on regex101.com:

(?<show>[\w ]+) - (?<episode>S[0-9]{2}E[0-9]{2}E?[0-9]{0,2}) - (?<title>[\w -]+)(?: - )(?<res>(?:720p)|(?:1080p))

It captures all the ones with resolutions appropriately, but the moment I add a ? to the last capture group-- which does include the standard def episode-- the title group absorbs the resolution. I think I need to include a negative lookahead in the title group, but I'm not sure how to do that and capture it at the same time. And yes, episode titles can have dashes in them.

Any pointers appreciated. If giving code snippets, I'm writing my renaming script in C#, if it makes any difference. Thanks.


Solution

  • You can use

    ^(?<show>.*?) - (?<episode>S[0-9]{2}E[0-9]{2}(?:E[0-9]{2})?) - (?<title>.*?)(?: - (?<res>(?:720|1080)p))?$
    

    See the regex demo. Details:

    • ^ - start of string -(?<show>.*?) - Group "show": any zero or more chars other than a newline, as few as possible
    • - - a literal - text
    • (?<episode>S[0-9]{2}E[0-9]{2}(?:E[0-9]{2})?) - Group "episode": S, two digits, E, two digits and an optional group matching one or zero occurrences of E and two digits
    • - - a literal - text
    • (?<title>.*?) - Group "title": any zero or more chars other than a newline, as few as possible
    • (?: - (?<res>(?:720|1080)p))? - an optional sequence of
      • - - a literal - text
      • (?<res>(?:720|1080)p) - Group "res": 720 or 1080 followed with p
    • $ - end of string.