Search code examples
python-3.xnlpspacy

How extract the current sentence and surrounding sentences around a particular word with Python?


Is there a way to get the surrounding sentences around any selected word in the sentence. Let's say our goal is to get the current sentence that contains the word "Champion" in your example below as well as the previous and next sentences that surround it regardless of their position, tag or how many times the word champion is repeated.

text = "This is sentence 1. We are the champions. This is sentence 3. This is sentence 4. This is sentence 5. You are champions too."

In example above the word champion is repeated in sentence 2 and 6. So we want to get sent 1,2,3,5,6 and exclude sent 4.

How can we achieve this with Spacy or other tools?


Solution

  • Using this function will give the surrounding sentences.

    from nltk.tokenize import sent_tokenize
    from nltk.tokenize import word_tokenize
    
    def surrounding_sentences(text, word):
    
        sentences=sent_tokenize(text)
        
        my_sents=[]
        for i in range(len(sentences)):
            if word in word_tokenize(sentences[i].lower()): 
                if i-1>0 : 
                    previous_sent = sentences[i-1]
                    my_sents.append(previous_sent)
                else: pass
                sent= sentences[i]
                my_sents.append(sent)
                if i+1 < len(sentences):
                    nextsent = sentences[i+1]
                    my_sents.append(nextsent)
                else: pass
        my_sents = list(set(my_sents))
        return my_sents