Search code examples
pythontext-extraction

Extracting specific range of text from a file


I'm trying to help my wife out with reviewing documents for work - the paragraphs of notes have different catagories that I am trying to extract as seperate strings to save to a different text file so that I can do other things to them later. An example paragraph is:

Observations of Client Behavior: Overall interfering behavior data trends are as followed: THIS IS THE DESIRED TEXT. Observations of Client's response to skill acquisition: Overall skill acquisition data trends ....

and Im trying to extract just the text between "Overall interfering behavior data trends are as followed:" to right before "Observations of Client's response to skill acquisition:"

I've experimented with regex with no success, any help in direction would be much appreciated, thanks!


Solution

  • Taken reference from this post Regular expression to return all characters between two special characters

    import re
    
    file = open("filename.txt", "r") # Insert the file name here
    
    pat = r'.*?Overall interfering behavior data trends are as followed:(.*)Observations of Client\'s response to skill acquisition:.*'
    match = re.search(pat, line)
    
    for line in file:
        print(match.group(1).strip())
    

    Gives output

    'THIS IS THE DESIRED TEXT.'