I'm currently trying to extract 4 words after "our", but keep getting words after "hour" and "your" as well.
i.e.) "my family will send an email in 2 hours when we arrive at." (text in the column)
What I want: nan (since there is no "our")
What I get: when we arrive at (because hour as "our" in it)
I tried the following code and still have no luck.
our = 'our\W+(?P<after>(?:\w+\W+){,4})'
Reviews_C['Review_for_Fam'] =Reviews_C.ReviewText2.str.extract(our, expand=True)
Can you please help?
Thank you!
You need to make sure "our" is with space boundaries, like this:
our = '(^|\s+)our(\s+)?\W+(?P<after>(?:\w+\W+){,4})'
specifically (^|\s+)our(\s+)?
is where you need to play, the example only handles spaces and start of sentence, but you might need to extend this to have quotes or other special characters.