remove all words that are not nouns, verbs, adjectives, adverbs, or proper names. spacy python

I wrote the code below and I want Print out the words in the first 10 sentences, and i want to remove all words that are not nouns, verbs, adjectives, adverbs, or proper names.but I dont know how? can anyone help me?

! pip install wget
import wget
url = '', 'moby_dick.txt')
documents = [line.strip() for line in open('moby_dick.txt', encoding='utf8').readlines()]

import spacy

nlp = spacy.load('en')

tokens = [[token.text for token in nlp(sentence)] for sentence in documents[:200]]
pos = [[token.pos_ for token in nlp(sentence)] for sentence in documents[:100]]


  • All you need is to know which POS symbols are used to represent these entities. Here is the list from Spacy documentation. This code will help you with this requirement:

    import spacy
    nlp = spacy.load('en_core_web_sm') #you can use other methods
    # excluded tags
    excluded_tags = {"NOUN", "VERB", "ADJ", "ADV", "ADP", "PROPN"}
    document = [line.strip() for line in open('moby_dick.txt', encoding='utf8').readlines()]
    sentences = document[:10] #first 10 sentences
    new_sentences = []
    for sentence in sentences:
        new_sentence = []
        for token in nlp(sentence):
            if token.pos_ not in excluded_tags:
        new_sentences.append(" ".join(new_sentence))

    Now, new_sentences have the same sentences like before but without any Nouns, verbs, ... etc. You can make sure of that by iterating over sentences and new_sentences to see the different:

    for old_sen, new_sen in zip(sentences, new_sentences):
        print("Before:", old_sen)
        print("After:", new_sen)
    Before: Loomings .
    After: .
    Before: Call me Ishmael .
    After: me .
    Before: Some years ago -- never mind how long precisely -- having little or no money in my purse , and nothing particular to interest me on shore , I thought I would sail about a little and see the watery part of the world .
    After: Some -- -- or no my , and nothing to me , I I a and the the .
    Before: It is a way I have of driving off the spleen and regulating the circulation .
    After: It is a I have the and the .