If I have the following list of strings:
a = ['Loc_RaffertytoLong_2004_02_21',
'Loc_RaffertytoLong_2004_02_22',
'Loc_RaffertytoLong_2004_02_23',
'Loc_RaffertytoLong_2004_02_24',
'Loc_RaffertytoLong_2004_02_26',
'Loc_RaffertytoLong_2004_02_27',
'Loc_RaffertytoLong_2004_02_28',
'Loc_RaffertytoLong_2004_02_29']
And I try to parse the date using dateutil
:
from dateutil import parse as dparse
for i in a:
print(dparse.parse(i,fuzzy=True))
I get the printout:
2019-02-21 00:00:00
2019-02-22 00:00:00
2019-02-23 00:00:00
2019-02-24 00:00:00
2019-02-26 00:00:00
2019-02-27 00:00:00
2019-02-28 00:00:00
And the error:
ValueError: ('Unknown string format:', 'Loc_RaffertytoLong_2004_02_29')
I am not sure why since 2004 is a leap year.
If you look at your output, dateutil
is interpreting your dates as dates in 2019 (which is not a leap year).
I was able to get your code to succeed by changing the line:
print(dparse.parse(i,fuzzy=True))
to:
print(dparse.parse('-'.join(i.split('_')[2:])))
and when I run the whole block, I get the output:
2004-02-21 00:00:00
2004-02-22 00:00:00
2004-02-23 00:00:00
2004-02-24 00:00:00
2004-02-26 00:00:00
2004-02-27 00:00:00
2004-02-28 00:00:00
2004-02-29 00:00:00
Interestingly, if we join on underscores like so:
print(dparse.parse('_'.join(i.split('_')[2:])))
it also interprets the dates as in the year 2019. This makes me think the issue is with how dateutil
handles underscores.
You can also simply replace the underscores with dashes:
from dateutil import parser
for i in a:
print(parser.parse(i.replace('_','-'), fuzzy=True))
prints the same output as above.