Search code examples
pythonregextextnlptext-mining

How to Extract Words Following a Key Word


I'm currently trying to extract 4 words after "our", but keep getting words after "hour" and "your" as well.

i.e.) "my family will send an email in 2 hours when we arrive at." (text in the column)

What I want: nan (since there is no "our")

What I get: when we arrive at (because hour as "our" in it)

I tried the following code and still have no luck.

our = 'our\W+(?P<after>(?:\w+\W+){,4})' 
Reviews_C['Review_for_Fam'] =Reviews_C.ReviewText2.str.extract(our, expand=True)

Can you please help?

Thank you!


Solution

  • You need to make sure "our" is with space boundaries, like this:

    our = '(^|\s+)our(\s+)?\W+(?P<after>(?:\w+\W+){,4})'
    

    specifically (^|\s+)our(\s+)? is where you need to play, the example only handles spaces and start of sentence, but you might need to extend this to have quotes or other special characters.