Search code examples
pythonstringparsingdatepython-dateutil

Parse a date from a complex string using python


I have a number of strings that have different date formats in them. I would like to be able to extract the date from the string. For example:

  • Today is August 2012. Tomorrow isn't
  • Another day 12 August, another time
  • 12/08 is another format
  • have another ? 08/12/12 could be
  • finally august 12 would be

What I would expect to get from each of these results is 2012-08-01 00:00:00, 2013-08-12 00:00:00, 2013-08-12 00:00:00, 2012-08-12 00:00:00, 2013-08-12 00:00:00.

I currently have this code:

from dateutil import parser
print parser.parse("Today is August 2012. Tomorrow isn't",fuzzy=True)

You will see from this that the date prints as 2012-08-27 00:00:00 (because today is the 27th of the month). What I would want in this example is 2012-08-01 00:00:00.

How do I force it to always put the first of the month if a day is not given? (For example if I give August 2012 it should return 2012-08-01, if I give it 12 August 2012 it should return 2012-08-12.)


Solution

  • Use the default argument to set the default date. This should handle all the cases except the third one, which is somewhat ambiguous and probably needs some parser tweaking or a mindreader:

    In [15]: from datetime import datetime
    
    In [16]: from dateutil import parser
    
    In [17]: DEFAULT_DATE = datetime(2013,1,1)
    
    In [18]: dates=["Today is August 2012. Tomorrow isn't",
        ...:        "Another day 12 August, another time",
        ...:        "12/08 is another format",
        ...:        "have another ? 08/12/12 could be", 
        ...:        "finally august 12 would be"]
    
    
    In [19]: for date in dates:
        ...:     print parser.parse(date,fuzzy=True, default=DEFAULT_DATE)
        ...:     
    2012-08-01 00:00:00
    2013-08-12 00:00:00
    2013-12-08 00:00:00  # wrong
    2012-08-12 00:00:00
    2013-08-12 00:00:00