Search code examples
pythonstocktwitsvader

Vader lexicon results don't add up to 1.0


My data is tweets from Stocktwits and I try to do sentiment analysis using the Vader library in python. The problem is that the positive, neutral and negative fields do not add up to 1.0. Instead of this, they add up to 2.0.

{'neg': 0.0, 'neu': 2.0, 'pos': 0.0, 'compound': 0.0}

Is this normal?


Solution

  • Yes, that's normal. The example in the docs shows similar results:

    VADER is smart, handsome, and funny.----------------------------- {'pos': 0.746, 'compound': 0.8316, 'neu': 0.254, 'neg': 0.0}
    VADER is smart, handsome, and funny!----------------------------- {'pos': 0.752, 'compound': 0.8439, 'neu': 0.248, 'neg': 0.0}
    ...
    VADER is not smart, handsome, nor funny.------------------------- {'pos': 0.0, 'compound': -0.7424, 'neu': 0.354, 'neg': 0.646}
    

    The pos , neu , and neg scores are ratios for proportions of text that fall in each category (so these should all add up to be 1... or close to it with float operation). These are the most useful metrics if you want multidimensional measures of sentiment for a given sentence.

    You probably want to use the compound score:

    The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.

    It is also useful for researchers who would like to set standardized thresholds for classifying sentences as either positive, neutral, or negative.