Search code examples
pythonpython-datetimedate-parsingpython-dateutil

Parser date in Python


I've a list of dates. Some of the dates get parsed using from dateutil import parser , but others dont. The list of dates that dont get parsed are :-

date1 = 'Tue Feb 10  2015 12 52pm IST'
date2 = '10 February  15  08 35am'
date3 = '2015 02 10 08 24 26 UTC'

I parse the dates in the following manner :-

try:
    date = re.sub('[^a-zA-Z0-9\n\.]', ' ', date)
    print date
    print (parser.parse(date)).date()
except Exception,e:
    print e

How can I parse all the date format? These are the dates scrapped from a webpage.

The final output should be of the format "Monday, 09 Feb"


Solution

  • Don't remove so much information. Leave in the : colons (your regex removes them but I bet there were there before you clobbered them); your dates parse fine if there are time separators:

    >>> from dateutil.parser import parse
    >>> date1 = 'Tue Feb 10  2015 12 52pm IST'
    >>> parse(date1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/dateutil/parser.py", line 743, in parse
        return DEFAULTPARSER.parse(timestr, **kwargs)
      File "/Users/mpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/dateutil/parser.py", line 310, in parse
        ret = default.replace(**repl)
    ValueError: hour must be in 0..23
    >>> date1_with_colon = 'Tue Feb 10  2015 12:52pm IST'
    >>> parse(date1_with_colon)
    datetime.datetime(2015, 2, 10, 12, 52)
    >>> date2_with_colon = '10 February  15  08:35am'
    >>> parse(date2_with_colon)
    datetime.datetime(2015, 2, 10, 8, 35)
    >>> date3_with_colon = '2015 02 10 08:24:26 UTC'
    >>> parse(date3_with_colon)
    datetime.datetime(2015, 2, 10, 8, 24, 26, tzinfo=tzutc())