I would like someone to correct my understanding of how VADER scores text. I've read an explanation of this process here, however I cannot match the compound score of test sentences to Vader's output when recreating the process it describes. Lets say we have the sentence:
"I like using VADER, its a fun tool to use"
The words VADER picks up are 'like' (+1.5 score), and 'fun' (+2.3). According to the documentation, these values are summed (so +3.8), and then normalized to a range between 0 and 1 using the following function:
(alpha = 15)
x / x2 + alpha
With our numbers, this should become:
3.8 / 14.44 + 15 = 0.1290
VADER, however, outputs the returned compound score as follows:
Scores: {'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.7003}
Where am I going wrong in my reasoning? Similar questions have been asked several times, however an actual example of VADER classifying has not yet been provided. Any help would be appreciated.
It's just your normalization that is wrong. From the code the function is defined:
def normalize(score, alpha=15):
"""
Normalize the score to be between -1 and 1 using an alpha that
approximates the max expected value
"""
norm_score = score/math.sqrt((score*score) + alpha)
return norm_score
So you have 3.8/sqrt(3.8*3.8 + 15) = 0.7003