Search code examples
pythonnlpsentiment-analysislexicon

How can I apply a lexicon to a list of sentences?


I have a lexicon dictionary in this shape

6   ابن جزمه    1
7   ابو جهل -1
8   اتق الله    -1
9   اتقو الله   1

I want to create a new list containing the score of each sentence based on the lexicon adding the score of each word and if no words exist append zero when I implement my code I get len(lex_score) = 3679 after I add elif condition I get len(lex_score) = 95079

the len(lex_score) should equal 6064

lex_score = []
def lexic(text):
    for tweet in sentences:
        score = 0
        for word in tweet.split():
            if word in lexicon:
                score = score+lexicon[word]
            elif word not in lexicon:
                score = 0
                lex_score.append(score)

I want to create a new column in the data frame containing the score of each sentence. what am I doing wrong? and is there a better way to do so ?


Solution

  • IIUC, you can just sum the scores of valid lexicon entries in each tweet, and then append that score to lex_score on each iteration of sentences.

    Note: I'm assuming text == sentences - otherwise there's a missing line where text is broken down into sentences. Either way, this basic approach should still work:

    def lexic(text):
        lex_score = []
        for tweet in text: # assuming sentences == text
            score = sum([lexicon[word] for word in tweet.split() if word in lexicon])
            lex_score.append(score)
        return lex_score