I have a lexicon dictionary in this shape
6 ابن جزمه 1
7 ابو جهل -1
8 اتق الله -1
9 اتقو الله 1
I want to create a new list containing the score of each sentence based on the lexicon adding the score of each word and if no words exist append zero
when I implement my code I get len(lex_score) = 3679
after I add elif condition I get len(lex_score) = 95079
the len(lex_score) should equal 6064
lex_score = []
def lexic(text):
for tweet in sentences:
score = 0
for word in tweet.split():
if word in lexicon:
score = score+lexicon[word]
elif word not in lexicon:
score = 0
lex_score.append(score)
I want to create a new column in the data frame containing the score of each sentence. what am I doing wrong? and is there a better way to do so ?
IIUC, you can just sum the scores of valid lexicon entries in each tweet, and then append that score to lex_score
on each iteration of sentences
.
Note: I'm assuming text == sentences
- otherwise there's a missing line where text
is broken down into sentences
. Either way, this basic approach should still work:
def lexic(text):
lex_score = []
for tweet in text: # assuming sentences == text
score = sum([lexicon[word] for word in tweet.split() if word in lexicon])
lex_score.append(score)
return lex_score