Search code examples
pythonstringnlpspacy

removing words from strings without affecting words using python spacy


I am using spacy, i have a list of sentences i want to remove stop words and punctuation from it.

for i in sentences_list: 
for token in docfile:
    if token.is_stop or token.is_punct and token.text in i[1]:
       i[1] = i[1].replace(token.text, '') 
print(sentences_list)

but it affect words too for example the word I is a stop word so the word big becomes bg.


Solution

  • You can use:

    " ".join([token.text for token in doc if not token.is_stop and not token.is_punct])
    

    Here is a sample code demo:

    import spacy
    nlp = spacy.load("en_core_web_sm")
    sentences_list = ["I like big planes.", "No, I saw no big flames."]
    new_sentence_list = []
    for i in sentences_list:
        doc = nlp(i)
        new_sentence_list.append(" ".join([token.text for token in doc if not token.is_stop and not token.is_punct]))
    

    The new_sentence_list is now:

    ['like big planes', 'saw big flames']