Search code examples
pythonpython-dateutil

dateutil and leap years


If I have the following list of strings:

a = ['Loc_RaffertytoLong_2004_02_21',
 'Loc_RaffertytoLong_2004_02_22',
 'Loc_RaffertytoLong_2004_02_23',
 'Loc_RaffertytoLong_2004_02_24',
 'Loc_RaffertytoLong_2004_02_26',
 'Loc_RaffertytoLong_2004_02_27',
 'Loc_RaffertytoLong_2004_02_28',
 'Loc_RaffertytoLong_2004_02_29']

And I try to parse the date using dateutil:

from dateutil import parse as dparse
for i in a:
    print(dparse.parse(i,fuzzy=True))

I get the printout:

2019-02-21 00:00:00
2019-02-22 00:00:00
2019-02-23 00:00:00
2019-02-24 00:00:00
2019-02-26 00:00:00
2019-02-27 00:00:00
2019-02-28 00:00:00

And the error:

ValueError: ('Unknown string format:', 'Loc_RaffertytoLong_2004_02_29')

I am not sure why since 2004 is a leap year.


Solution

  • If you look at your output, dateutil is interpreting your dates as dates in 2019 (which is not a leap year).

    I was able to get your code to succeed by changing the line:

    print(dparse.parse(i,fuzzy=True))
    

    to:

    print(dparse.parse('-'.join(i.split('_')[2:])))
    

    and when I run the whole block, I get the output:

    2004-02-21 00:00:00
    2004-02-22 00:00:00
    2004-02-23 00:00:00
    2004-02-24 00:00:00
    2004-02-26 00:00:00
    2004-02-27 00:00:00
    2004-02-28 00:00:00
    2004-02-29 00:00:00
    

    Interestingly, if we join on underscores like so:

    print(dparse.parse('_'.join(i.split('_')[2:])))
    

    it also interprets the dates as in the year 2019. This makes me think the issue is with how dateutil handles underscores.


    You can also simply replace the underscores with dashes:

    from dateutil import parser
    for i in a:
        print(parser.parse(i.replace('_','-'), fuzzy=True))
    

    prints the same output as above.