Search code examples
pythondatenlpdatefinder

Identify and Extract Date from String - Python


I am looking to identify and extract a date from a number of different strings. The dates may not be formatted the same. I have been using the datefinder package but I am having some issues saving the output.

Goal: Extract the date from a string, which may be formatted in a number of different ways (ie April,22 or 4/22 or 22-Apr etc) and if there is no date, set the value to 'None' and append the date list with either the date or 'None'.

Please see the examples below.

Example 1: (This returns a date, but does not get appended to my list)


import datefinder

extracted_dates = []
sample_text = 'As of February 27, 2019 there were 28 dogs at the kennel.'

matches = datefinder.find_dates(sample_text)
for match in matches:
    if match == None:
        date = 'None'
        extracted_dates.append(date)
    else:
        date = str(match)
        extracted_dates.append(date)

Example 2: (This does not return a date, and does not get appended to my list)

import datefinder

extracted_dates = []
sample_text = 'As of the date, there were 28 dogs at the kennel.'

matches = datefinder.find_dates(sample_text)
for match in matches:
    if match == None:
        date = 'None'
        extracted_dates.append(date)
    else:
        date = str(match)
        extracted_dates.append(date)

Solution

  • I have tried using your package, but it seemed that there was no fast and general way of extracting the real date on your example.

    I instead used the DateParser package and more specifically the search_dates method

    I briefly tested it on your examples only.

    from dateparser.search import search_dates
    
    sample_text = 'As of February 27, 2019 there were 28 dogs at the kennel.'
    extracted_dates = []
    
    # Returns a list of tuples of (substring containing the date, datetime.datetime object)
    dates = search_dates(sample_text)
    
    if dates is not None:
      for d in dates:
        extracted_dates.append(str(d[1]))
    else:
      extracted_dates.append('None')
    
    print(extracted_dates)