Search code examples
pythonregexpython-re

Regex for a extracting a string starting with a particular word and ending with a year


INPUT 1: The string is enclosed CASE NO.: Appeal (civil) 648 of 2007 in between.

OUTPUT 1: Appeal (civil) 648 of 2007

INPUT 2: The string is enclosed CASE NO.: Appeal (civil) 6408 of 2007 in between.

OUTPUT 2: Appeal (civil) 6408 of 2007

I want to extract the string starting with the word CASE NO.(Case Insensitive) and ending with the year being the second occurrence of a number.

I have tried the following code.

case_no = re.search(r'(?=Case No)(\w+\W+)*?\b\d{4}\b', contents, re.IGNORECASE)
    if case_no:
        print(case_no.group(0))

Solution

  • I would use a lazy dot here to match the nearest year occurring after CASE NO.:

    inp = "The string is enclosed CASE NO.: Appeal (civil) 6408 of 2007 in between."
    m = re.search(r'\bCASE NO\.:\s*(.*\b\d{4}\b)', inp)
    print(m.group())  # Appeal (civil) 6408 of 2007