Search code examples
pythoncsvnltkstemming

WordListCorpusReader is not iterable


So, I am new to using Python and NLTK. I have a file called reviews.csv which consists of comments extracted from amazon. I have tokenized the contents of this csv file and written it to a file called csvfile.csv. Here's the code :

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.stem import PorterStemmer
import csv #CommaSpaceVariable
from nltk.corpus import stopwords
ps = PorterStemmer()
stop_words = set(stopwords.words("english"))
with open ('reviews.csv') as csvfile:
    readCSV = csv.reader(csvfile,delimiter='.')    
    for lines in readCSV:
        word1 = word_tokenize(str(lines))
        print(word1)
    with open('csvfile.csv','a') as file:
        for word in word1:
            file.write(word)
            file.write('\n')
    with open ('csvfile.csv') as csvfile:
        readCSV1 = csv.reader(csvfile)
    for w in readCSV1:
        if w not in stopwords:
            print(w)

I am trying to perform stemming on csvfile.csv. But I get this error:

  Traceback (most recent call last):<br>
  File "/home/aarushi/test.py", line 25, in <module> <br>
   if w not in stopwords: <br>
  TypeError: argument of type 'WordListCorpusReader' is not iterable

Solution

  • When you did

    from nltk.corpus import stopwords
    

    stopwords is the variable that's pointing to the CorpusReader object in nltk.

    The actual stopwords (i.e. a list of stopwords) you're looking for is instantiated when you do:

    stop_words = set(stopwords.words("english"))
    

    So when checking whether a word in your list of tokens is a stopwords, you should do:

    from nltk.corpus import stopwords
    stop_words = set(stopwords.words("english"))
    for w in tokenized_sent:
        if w not in stop_words:
            pass # Do something.
    

    To avoid confusion, I usually name the actual list of stopwords as stoplist:

    from nltk.corpus import stopwords
    stoplist = set(stopwords.words("english"))
    for w in tokenized_sent:
        if w not in stoplist:
            pass # Do something.