Search code examples
pythonregexstringfindall

Getting a date from a string text with re.findall()


I'm trying to extract dates from a whole string text. But I have not any idea how to solve this problem:

The format of dates that I'm finding is 19 Oct. 20 or 19 Oct. 2020

To achieve that I use the following code:

'''re.findall(r'\d*\d (?:%s)\.? \d{2,4}\b' % '|'.join(m.title().rstrip('.') for m in calendar.month_abbr[1:]),string)'''

The problem comes when something like this appears in the text: 19 Oct 16:35 and re.findall() returns 19 Oct 16.

How can I get it to only return what I am looking for?

Thanks!


Solution

  • You may fail the match if there is a colon and digit after two digits:

    r'\d*\d (?:%s)\.? (?:\d{4}\b|\d{2}\b(?!:\d))'
    

    See the regex demo

    The (?:\d{4}\b|\d{2}\b(?!:\d)) will match either four digits followed with a word boundary, or two digits also followed with a word boundary but not followed with : and a digit.