I have written a python script with the following function, which takes as input a file name that contains multiple dates.
CODE
import re
from datetime import datetime
def ExtractReleaseYear(title):
rg = re.compile('.*?([\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL)
match = rg.search(title) # Using non-greedy match on filler
if match:
releaseYear = match.group(1)
try:
if int(releaseYear) >= 1900 and int(releaseYear) <= int(datetime.now().year) and int(releaseYear) <= 2099: # Film between 1900-2099
return releaseYear
except ValueError:
print("ERROR: The film year in the file name could not be converted to an integer for comparison.")
return ""
print(ExtractReleaseYear('2012.(2009).3D.1080p.BRRip.SBS.x264'))
print(ExtractReleaseYear('Into.The.Storm.2012.1080p.WEB-DL.AAC2.0.H264'))
print(ExtractReleaseYear('2001.A.Space.Odyssey.1968.1080p.WEB-DL.AAC2.0.H264'))
OUTPUT
Returned: 2012 -- I'd like this to be 2009 (i.e. last occurrence of year in string)
Returned: 2012 -- This is correct! (last occurrence of year is the first one, thus right)
Returned: 2001 -- I'd like this to be 1968 (i.e. last occurrence of year in string)
ISSUE
As can be observed, the regex will only target the first occurrence of a year instead of the last. This is problematic because some titles (such as the ones included here) begin with a year.
Having searched for ways to get the last occurrence of the year has led me to this resources like negative lookahead, last occurrence of repeated group and last 4 digits in URL, none of which have gotten me any closer to achieving the desired result. No existing question currently answers this unique case.
INTENDED OUTCOME
There are two things you need to change:
.*?
lazy pattern must be turned to greedy .*
(in this case, the subpatterns after .*
will match the last occurrence in the string)See this demo:
rg = re.compile('.*([\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL)
...
releaseYear = match.group(2)
Or:
rg = re.compile('.*(?:[\[\(]?((?:19[0-9]|20[01])[0-9])[\]\)]?)', re.IGNORECASE|re.DOTALL)
...
releaseYear = match.group(1)