I'm trying to extract dates from a whole string text. But I have not any idea how to solve this problem:
The format of dates that I'm finding is 19 Oct. 20 or 19 Oct. 2020
To achieve that I use the following code:
'''re.findall(r'\d*\d (?:%s)\.? \d{2,4}\b' % '|'.join(m.title().rstrip('.') for m in calendar.month_abbr[1:]),string)'''
The problem comes when something like this appears in the text: 19 Oct 16:35 and re.findall()
returns 19 Oct 16
.
How can I get it to only return what I am looking for?
Thanks!
You may fail the match if there is a colon and digit after two digits:
r'\d*\d (?:%s)\.? (?:\d{4}\b|\d{2}\b(?!:\d))'
See the regex demo
The (?:\d{4}\b|\d{2}\b(?!:\d))
will match either four digits followed with a word boundary, or two digits also followed with a word boundary but not followed with :
and a digit.