Search code examples
pythonregexpython-re

Finding dates in text using regex


I want to find all dates in a text if there is no word Effective before the date. For example, I have the following line:

FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022

My regex should return ['January , 2022', 'January 5, 2022']

How can I do this in Python?

My attempt:

>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]

But it doesn't work.


Solution

  • You can use

    \b(?<!Effective\s)[A-Za-z]{3,9}\s*\d{1,2}\s*,\s*\d{4}(?!\d)
    

    See the regex demo. Details:

    • \b - a word boundary
    • (?<!Effective\s) - a negative lookbehind that fails the match if there is Effective + a whitespace char immediately to the left of the current location
    • [A-Za-z]{3,9} - three to nine ASCII letters
    • \s* - zero or more whitespaces
    • \d{1,2} - one or two digits
    • \s*,\s* - a comma enclosed with zero or more whitespaces
    • \d{4} - four digits
    • (?!\d) - a negative lookahead that fails the match if there is a digit immediately on the right.