Search code examples
pythonpython-2.7python-dateutil

Python dateutil parser, ignore non-date part of string


I am using dateutil to parse picture filenames and sort them according to date. Since not all my pictures have metadata, dateutil is trying to guess where to put them.

Most of my pictures are in this format: 2007-09-10_0001.jpg 2007-09-10_0002.jpg etc...

fileName = os.path.splitext(file)[0]
print("Guesssing date from ", fileName)
try:
    dateString = dateParser.parse(file, fuzzy=True)
    print("Guessed date", dateString)
    year=dateString.year
    month = dateString.month
    day=dateString.day
except ValueError:
    print("Unable to determine date of ", file)

The return I am getting is this:

('Guesssing date from ', '2007-09-10_00005')
('Unable to determine date of ', '2007-09-10_00005.jpg')

Now I should be able to strip everything from after the underscore, but I wanted a more robust solution if possible, in case I have pictures in another format. I though fuzzy would try and find any date in the string and match to that, but apparently not working...

Is there an easy way to get the parser to find anything that looks like a date and stop after that? If not, what is the easiest way to force the parser to ignore everything after the underscore? Or a way to define multiple date formats with ignore sections.

Thanks!


Solution

  • You can try to "reduce" the string as long as you can't decode it:

    from dateutil import parser
    
    def reduce_string(string):
        i = len(string) - 1
        while string[i] >= '0' and string[i] < '9':
            i -= 1
        while string[i] < '0' or string[i] > '9':
            i -= 1
        return string[:i + 1]
    
    def find_date(string):
        while string:
            try:
                dateString = parser.parse(string, fuzzy=True)
                year = dateString.year
                month = dateString.month
                day = dateString.day
                return (year, month, day)
            except ValueError:
                pass
    
            string = reduce_string(string)
    
        return None
    
    date = find_date('2007-09-10_00005')
    if date:
        print date
    else:
        print "can't decode"
    

    The idea is to removing the end of the string (any numbers then any non-numbers) until the parser can decode it to a valid date.