Search code examples
pythonregexdatefinder

Find date from image/text


I have dates like this and I need regex to find these types of dates

   12-23-2019
   29 10 2019
   1:2:2018
   9/04/2019
   22.07.2019

here's what I did first I removed all spaces from the text and here's what it looks like

   12-23-2019291020191:02:2018

and this is my regex

    re.findall(r'((\d{1,2})([.\/-])(\d{2}|\w{3,9})([.\/-])(\d{4}))',new_text)

it can find 12-23-2019 , 9/04/2019 , 22.07.2019 but cannot find 29 10 2019 and 1:02:2018


Solution

  • You may use

    (?<!\d)\d{1,2}([.:/ -])(?:\d{1,2}|\w{3,})\1\d{4}(?!\d)
    

    See the regex demo

    Details

    • (?<!\d) - no digit right before
    • \d{1,2} - 1 or 2 digits
    • ([.:/ -]) - a dot, colon, slash, space or hyphen (captured in Group 1)
    • (?:\d{1,2}|\w{3,}) - 1 or 2 digits or 3 or more word chars
    • \1 - same value as in Group 1
    • \d{4} - four digits
    • (?!\d) - no digit allowed right after

    Python sample usage:

    import re
    text = 'Aaaa 12-23-2019, bddd   29 10 2019 <===   1:2:2018'
    pattern = r'(?<!\d)\d{1,2}([.:/ -])(?:\d{1,2}|\w{3,})\1\d{4}(?!\d)'
    results = [x.group() for x in re.finditer(pattern, text)]
    print(results) # => ['12-23-2019', '29 10 2019', '1:2:2018']