I am using the VADER sentiment lexicon in Python's nltk library to analyze text sentiment. This lexicon does not suit my domain well, and so I wanted to add my own sentiment scores to various words. So, I got my hands on the lexicon text file (vader_lexicon.txt) to do just that. However, I do not understand the architecture of this file well. For example, a word like obliterate will have the following data in the text file: obliterate -2.9 0.83066 [-3, -4, -3, -3, -3, -3, -2, -1, -4, -3]
Clearly the -2.9 is the average of sentiment scores in the list. But what does the 0.83066 represent?
Thanks!
According to the VADER source code, only the first number on each line is used. The rest of the line is ignored:
for line in self.lexicon_full_filepath.split('\n'):
(word, measure) = line.strip().split('\t')[0:2] # Here!
lex_dict[word] = float(measure)