Search code examples
pythonpandaslisttext-processing

How to build a for loop that prints the sentiment score of each string and does not produce a key error?


I have a dataset of tweets that I put into a pandas dataframe and converted each row to a string so that each row could be analysed with the my sentiment analyzer. I'm trying to print the sentiment score of each tweet using a for loop:

for row in msmarvel.Text:
    print(text_sentiment(row))

It works for the first few tweets,

2.4332083615899887
3.479569526740967
2.426372867331215
2.2458306180346703
2.2478570548004133
0.9351690267777979

but then gives this error:

KeyError                                  Traceback (most recent call last)
C:\Users\SHEHZA~1\AppData\Local\Temp/ipykernel_2420/262060431.py in <module>
      3         if word not in embeddings.index:
      4             continue
----> 5     print(text_sentiment(row))

C:\Users\SHEHZA~1\AppData\Local\Temp/ipykernel_2420/923749346.py in text_sentiment(text)
      5 def text_sentiment(text):
      6     tokens = [token.casefold() for token in TOKEN_RE.findall(text)]
----> 7     sentiments = words_sentiment(tokens)
      8     return sentiments['sentiment'].mean()

C:\Users\SHEHZA~1\AppData\Local\Temp/ipykernel_2420/994030881.py in words_sentiment(words)
     11 
     12 def words_sentiment(words):
---> 13     vecs = embeddings.loc[words].dropna() # vectors are defined by searching words (we provide) that are in the embeddings dictionary
     14     log_odds = vector_sentiment(vecs) # vector sentiment is calculated by getting the log probability
     15     return pd.DataFrame({'sentiment': log_odds}, index=vecs.index)

~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    929 
    930             maybe_callable = com.apply_if_callable(key, self.obj)
--> 931             return self._getitem_axis(maybe_callable, axis=axis)
    932 
    933     def _is_scalar_access(self, key: tuple):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1151                     raise ValueError("Cannot index with multidimensional key")
   1152 
-> 1153                 return self._getitem_iterable(key, axis=axis)
   1154 
   1155             # nested tuple slicing

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_iterable(self, key, axis)
   1091 
   1092         # A collection of keys
-> 1093         keyarr, indexer = self._get_listlike_indexer(key, axis)
   1094         return self.obj._reindex_with_indexers(
   1095             {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis)
   1312             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1313 
-> 1314         self._validate_read_indexer(keyarr, indexer, axis)
   1315 
   1316         if needs_i8_conversion(ax.dtype) or isinstance(

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis)
   1375 
   1376             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 1377             raise KeyError(f"{not_found} not in index")
   1378 
   1379 

KeyError: "['fbexclusive'] not in index"

The problem is there are words in some of the tweets (particularly slang words or grammatically incorrect words) that can't be analyzed with the sentiment analyzer because they are not present in the word embeddings dataframe. So I keep getting a key error.

I need to create a for loop that ignores any words that aren't in the embeddings vocabulary but still prints the sentiment score for each string otherwise. How should I do this?


Solution

  • At your sentiment functions you can use try/except concept so you can define what to do if an exception raises. It's not going to be perfect example because dont know what your functions do actually but your can try;

     def text_sentiment(text):
         try:
             tokens = [token.casefold() for token in TOKEN_RE.findall(text)]
             sentiments = words_sentiment(tokens)
             return sentiments['sentiment'].mean()
         except KeyError:
             pass