Search code examples
pythondatetimepython-re

Extracting dates with format '%D %d, %Y' from a string


I was trying to extract the date from a string with the format '%D %d, %Y' (e.g. 'November 1, 1960' from 'November 1, 1960 ( 1960-11-01 )').

Now I have:

import re
from datetime import datetime

word_months = r'January|February|March|April|May|June|July|August|September|October|November|December'
word_day_year = r'\d+,\s\d{4}'
word_date = rf'{word_months}({word_day_year})'
s = 'November 1, 1960 ( 1960-11-01 )'
re.search(word_date, s).group()

I have the output be 'November' instead of 'November 1, 1960'. What have I done wrong?


Solution

  • Group the months, and account for the space after the month / before the day.

    Consider limiting the day to two digits, and grouping both the day and the year for further validation.

    import re
    
    word_months = r'(January|February|March|April|May|June|July|August|September|October|November|December)'
    word_day_year = r'\s(\d{1,2}),\s(\d{4})'
    word_date = rf'{word_months}{word_day_year}'
    
    m = re.search(word_date, 'November 1, 1960 ( 1960-11-01 )')
    
    print(f'Primary match: <{m.group()}>')
    print('Sub matches:')
    
    for g in m.groups():
        print(f'<{g}>')
    
    Primary match: <November 1, 1960>
    Sub matches:
    <November>
    <1>
    <1960>
    

    Also consider match, or otherwise anchor the expression (if appropriate).