I was trying to extract the date from a string with the format '%D %d, %Y'
(e.g. 'November 1, 1960'
from 'November 1, 1960 ( 1960-11-01 )'
).
Now I have:
import re
from datetime import datetime
word_months = r'January|February|March|April|May|June|July|August|September|October|November|December'
word_day_year = r'\d+,\s\d{4}'
word_date = rf'{word_months}({word_day_year})'
s = 'November 1, 1960 ( 1960-11-01 )'
re.search(word_date, s).group()
I have the output be 'November'
instead of 'November 1, 1960'
. What have I done wrong?
Group the months, and account for the space after the month / before the day.
Consider limiting the day to two digits, and grouping both the day and the year for further validation.
import re
word_months = r'(January|February|March|April|May|June|July|August|September|October|November|December)'
word_day_year = r'\s(\d{1,2}),\s(\d{4})'
word_date = rf'{word_months}{word_day_year}'
m = re.search(word_date, 'November 1, 1960 ( 1960-11-01 )')
print(f'Primary match: <{m.group()}>')
print('Sub matches:')
for g in m.groups():
print(f'<{g}>')
Primary match: <November 1, 1960>
Sub matches:
<November>
<1>
<1960>
Also consider match
, or otherwise anchor the expression (if appropriate).