Search code examples
pythonpandasdataframenlptextblob

How to split every sentence into individual words and average polarity score per sentence and append into new column in dataframe?


I can successfully split a sentence into its individual words and take of every average of the polarity score of every word using this code. It works great.

import statistics as s
from textblob import TextBlob

a = TextBlob("""Thanks, I'll have a read!""")
print(a)

    c=[]
    for i in a.words: 
        c.append(a.sentiment.polarity)
        d = s.mean(c)


d = 0.25
a.words = WordList(['Thanks', 'I', "'ll", 'have', 'a', 'read'])

How do I transfer the above code to a df that looks like this?:

df

     text
1    Thanks, I’ll have a read!

but take the average of every polarity per word?

The closet is I can apply polarity to every sentence for every sentence in df:

def sentiment_calc(text):
    try:
        return TextBlob(text).sentiment.polarity
    except:
        return None

df_sentences['sentiment'] = df_sentences['text'].apply(sentiment_calc)

Solution

  • I have the impression the sentiment polarity only works on TextBlob type.

    So my idea here is to split the text blob into words (with the split function -- see doc here) and convert them to TextBlob objects. This is done in the list comprehension:

    [TextBlob(x).sentiment.polarity for x in a.split()]
    

    So the whole thing looks like this:

    import statistics as s
    from textblob import TextBlob
    import pandas as pd
    
    a = TextBlob("""Thanks, I'll have a read!""")
    
    def compute_mean(a):
        return s.mean([TextBlob(x).sentiment.polarity for x in a.split()])
    
    print(compute_mean("Thanks, I'll have a read!"))
    
    df = pd.DataFrame({'text':["Thanks, I'll have a read!",
        "Second sentence",
        "a bag of apples"]})
    
    df['score'] = df['text'].map(compute_mean)
    print(df)