Search code examples
pythonsentiment-analysisvader

'list' object has no attribute 'encode': sentiment analysis


I would like to conduct sentiment analysis of some texts using Vader (but the problem I am describing here applies to any lexicons as well, in addition to Vader). However, after going through all the data processing including tokenizing and converting to lower case (which I have not mentioned here) I get the following error:

Any idea how to process the documents so that the lexicon can read the texts? Thanks.

AttributeError: 'list' object has no attribute 'encode'

with open('data_1.txt') as g:
    data_1 = g.read()
with open('data_2.txt') as g:
    data_2 = g.read()
with open('data_3.txt') as g:
    data_3 = g.read()

df_1 = pd.DataFrame({"text":[data_1, data_2, data_3]})

df_1.head()
                                                 text
#0  [[bangladesh, education, commission, report, m...
#1  [[english, version, glis, ministry, of, educat...
#2  [[national, education, policy, 2010, ministry,...

from nltk.sentiment.vader import SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()

df_1['Vader_sentiment'] = df_1.text.apply(lambda x: vader.polarity_scores(x)['compound'])

AttributeError: 'list' object has no attribute 'encode'


Solution

  • df_1.text is a Series of lists of lists. You cannot apply VADER to any lists, especially to lists of lists. Convert the lists to strings and then apply VADER:

    df_1['text_as_string'] = df_1['text'].str[0].str.join(" ")
    df_1['text_as_string'].apply(lambda x: vader.polarity_scores(x)['compound'])