I have a number of strings that have different date formats in them. I would like to be able to extract the date from the string. For example:
What I would expect to get from each of these results is 2012-08-01 00:00:00, 2013-08-12 00:00:00, 2013-08-12 00:00:00, 2012-08-12 00:00:00, 2013-08-12 00:00:00.
I currently have this code:
from dateutil import parser
print parser.parse("Today is August 2012. Tomorrow isn't",fuzzy=True)
You will see from this that the date prints as 2012-08-27 00:00:00 (because today is the 27th of the month). What I would want in this example is 2012-08-01 00:00:00.
How do I force it to always put the first of the month if a day is not given? (For example if I give August 2012 it should return 2012-08-01, if I give it 12 August 2012 it should return 2012-08-12.)
Use the default
argument to set the default date. This should handle all the cases except the third one, which is somewhat ambiguous and probably needs some parser tweaking or a mindreader:
In [15]: from datetime import datetime
In [16]: from dateutil import parser
In [17]: DEFAULT_DATE = datetime(2013,1,1)
In [18]: dates=["Today is August 2012. Tomorrow isn't",
...: "Another day 12 August, another time",
...: "12/08 is another format",
...: "have another ? 08/12/12 could be",
...: "finally august 12 would be"]
In [19]: for date in dates:
...: print parser.parse(date,fuzzy=True, default=DEFAULT_DATE)
...:
2012-08-01 00:00:00
2013-08-12 00:00:00
2013-12-08 00:00:00 # wrong
2012-08-12 00:00:00
2013-08-12 00:00:00