Search code examples
pythonparsingdatetimepython-dateutil

python dateutil.parser wrong (??) parsing


I am trying the following (python 3.6)

import dateutil.parser as dp
t1 = '0001-04-23T02:25:43.511Z'
t2 = '0001-04-23T01:25:43.511Z'
print(dp.parse(t1))
print(dp.parse(t2))

which gives me

0001-04-23 02:25:43.511000+00:00
0023-01-04 01:25:43.511000+00:00

In various similar cases, when the year string has form 00XY and the hour string XY, the parser seems to produce the wrong output. Am I missing something, or is this a bug?


Solution

  • This was a bug in dateutil that was fixed (initial work here, but this specific edge case was fixed here). Using python-dateutil>=2.7.0 will fix your issue.

    import dateutil
    import dateutil.parser as dp
    
    print(dateutil.__version__)
    # 2.7.2
    
    t1 = '0001-04-23T02:25:43.511Z'
    t2 = '0001-04-23T01:25:43.511Z'
    
    print(dp.parse(t1))
    0001-04-23 02:25:43.511000+00:00
    
    print(dp.parse(t2))
    0001-04-23 01:25:43.511000+00:00
    

    I do not recommend using yearfirst as it has other effects on how your datetime strings are parsed, and it is essentially an implementation detail that it works at all in the buggy case (since the bug involves interpreting 0001 as being equivalent to 01, which it is not).

    If you do know that you have an ISO-8601 formatted datetime, dateutil.parser.isoparse will be faster and stricter, and does not have this bug. It was also introduced in version 2.7.0:

    print(isoparse('0001-04-23T02:25:43.511Z'))
    # 0001-04-23 02:25:43.511000+00:00
    
    print(isoparse('0001-04-23T01:25:43.511Z'))
    # 0001-04-23 01:25:43.511000+00:00