Search code examples
pythonpython-re

How to extract text between two words in a document by python3 re?


I wan to extract text between Love and OK with following code but it does not work.

document = "This is a document with random words Love apples ornages pears OK some thing Love jeep plane car OK any more Love water cola coffee OK bra bra."

x = re.search("^Love.*OK$", document)

I want to get follwing text: apples ornages pears jeep plane car water cola coffee


Solution

  • We can try using your current regex pattern (modified slightly) eith re.findall, to find all substring matches. Then, join the resulting array together as a single string.

    document = "This is a document with random words Love apples oranges pears OK some thing Love jeep plane car OK any more Love water cola coffee OK bra bra."
    matches = re.findall(r'\bLove (.*?) OK\b', document)
    print(' '.join(matches))
    

    This prints:

    apples oranges pears jeep plane car water cola coffee
    

    Explanation:

    The regex pattern \bLove (.*?) OK\b will capture the content between each Love ... OK set of markers. This generates, in this case, three substrings. We then join the output array from re.findall into a single string using join().