I have a film title in the following format
(Studio Name) - Film Title Part-1** - Animation** (2014).mp4
The part in BOLD is optional, meaning I can have a title such as this
(Studio Name) - Film Title Part-1 (2014).mp4
With this regex
^\((?P<studio>.+)\) - (?P<title>.+)(?P<genre>-.+)\((?P<year>\d{4})\)
I get the following results
studio = Studio Name title = Film Title Part-1 genre = - Animation year = 2014
I have tried the following to make the "- Animation" optional by changing the regex to
^\((?P<studio>.+)\) - (?P<title>.+)(?:(?P<genre>-.+)?)\((?P<year>\d{4})\)
but I end up with the following results
studio = Studio Name title = Film Title Part-1 - Animation genre = year = 2014
I am using Python, the code that I am executing to process the regex is
pattern = re.compile(REGEX)
matched = pattern.search(film)
You can omit the non capturing group around the genre, make change the first .*
to a negated character class [^()]
matching any char except parenthesis and make the .+
in greoup title non greedy to allow matching the optional genre group.
For the genre, you could match .+
, or make the match more specific if you only want to match a single word.
^\((?P<studio>[^()]+)\) - (?P<title>.+?)(?P<genre>- \w+ )?\((?P<year>\d{4})\)
Explanation
^
Start of string\((?P<studio>[^()]+)\)
Named group studio match any char except parenthesis between (
and )
-
Match literally(?P<title>.+?)
Named group title, match any char except a newline as least as possible(?P<genre>- \w+ )?
Named group genre, match -
space, 1+ word chars and space\((?P<year>\d{4})\)
named group year, match 4 digits between (
and )
If you want to match the whole line:
^\((?P<studio>[^()]+)\) - (?P<title>.+?)(?P<genre>- \w+ )?\((?P<year>\d{4})\)\.mp4$