Search code examples
nltkstop-words

Remove stopwords with nltk.corpus from list with lists


I have a list containing lists with all seperated words of a review, that looks like this:

texts = [['fine','for','a','night'],['it','was','good']]

I want to remove all stopwords, using the nltk.corpus package, and put all the words without stopwords back into the list. The end results should be a list, consisting of a lists of words without stopwords. This it was I tried:

import nltk
nltk.download() # to download stopwords corpus
from nltk.corpus import stopwords
stopwords=stopwords.words('english')
words_reviews=[]

for review in texts:
    wr=[]
    for word in review:
        if word not in stopwords:
            wr.append(word)
        words_reviews.append(wr)

This code actually worked, but now I get the error: AttributeError: 'list' object has no attribute 'words', referring to stopwords. I made sure that I installed all packages. What could be the problem?


Solution

  • The problem is that you redefine stopwords in your code:

    from nltk.corpus import stopwords
    stopwords=stopwords.words('english')
    

    After the first line, stopwords is a corpus reader with a words() method. After the second line, it is a list. Proceed accordingly.

    Actually looking things up in a list is really slow, so you'll get much better performance if you use this:

    stopwords = set(stopwords.words('english'))