Python dateutil parser, ignore non-date part of string

I am using dateutil to parse picture filenames and sort them according to date. Since not all my pictures have metadata, dateutil is trying to guess where to put them.

Most of my pictures are in this format: 2007-09-10_0001.jpg 2007-09-10_0002.jpg etc...

fileName = os.path.splitext(file)[0]
print("Guesssing date from ", fileName)
try:
    dateString = dateParser.parse(file, fuzzy=True)
    print("Guessed date", dateString)
    year=dateString.year
    month = dateString.month
    day=dateString.day
except ValueError:
    print("Unable to determine date of ", file)

The return I am getting is this:

('Guesssing date from ', '2007-09-10_00005')
('Unable to determine date of ', '2007-09-10_00005.jpg')

Now I should be able to strip everything from after the underscore, but I wanted a more robust solution if possible, in case I have pictures in another format. I though fuzzy would try and find any date in the string and match to that, but apparently not working...

Is there an easy way to get the parser to find anything that looks like a date and stop after that? If not, what is the easiest way to force the parser to ignore everything after the underscore? Or a way to define multiple date formats with ignore sections.

Thanks!

Solution

You can try to "reduce" the string as long as you can't decode it:

from dateutil import parser

def reduce_string(string):
    i = len(string) - 1
    while string[i] >= '0' and string[i] < '9':
        i -= 1
    while string[i] < '0' or string[i] > '9':
        i -= 1
    return string[:i + 1]

def find_date(string):
    while string:
        try:
            dateString = parser.parse(string, fuzzy=True)
            year = dateString.year
            month = dateString.month
            day = dateString.day
            return (year, month, day)
        except ValueError:
            pass

        string = reduce_string(string)

    return None

date = find_date('2007-09-10_00005')
if date:
    print date
else:
    print "can't decode"

The idea is to removing the end of the string (any numbers then any non-numbers) until the parser can decode it to a valid date.