Search code examples
regexpython-3.xspacydata-extraction

extracting only the specified value from text using python


I have a text file containing data as given below. I have to extract all lines containing signed

The document was signed on July 12

The document was signed by Charlie

This document was assigned to John

The document was preassigned to Amanda

Expected output:

The document was signed on July 12

The document was signed by Charlie

If I am using:

for line in file:
    if "signed" in line:
        print (line)

It is printing all the lines


Solution

  • This is easily done by using a word boundary \b. \bsigned will match signed, but not assigned.

    See here

    You can use re.search(line, ".*\bsigned.*")