Search code examples
pythonpandasnlptokenizestop-words

python nltk loop printing header instead of the value


I have tokenized sentences in a csv file but when I'm trying to remove the stop words within the for loop it stops printing the words and it prints the column header for all sentences any idea where is the error in the last line ?

for review in tokenized_docs:
    new_review = []
    for token in review:
        new_token = x.sub(u'', token)
        if not new_token == u'':
            new_review.append(new_token)
    tokenized_docs_no_punctuation.append(new_review)
    words=pd.DataFrame(tokenized_docs_no_punctuation)
    #print(words)
    print([word for word in words if word not in stops])

the output shows like this

on

which should be the words instead of the column header numbers.


Solution

  • As words in your code is dataframe, word becomes column name (0, 1, 2,.. ) in for loop.

    You can just change to list. For example,

    # before
    # words=pd.DataFrame(tokenized_docs_no_punctuation)
    
    # after
    words = tokenized_docs_no_punctuation[0]
    

    worked for me.