Search code examples
sentiment-analysisquanteda

How to have a sentiment score for a document in Quanteda?


I am new in sentiment analysis. Quanteda examples show how to output numbers of positive and negative words. I tested some documents. It output below:

Case 1

document    negative    positive
file1   28  28
file2   98  71
file3   28  22
file4   37  39
file5   7   36

or below

Case 2

document    negative    positive    neg_positive    neg_negative
file1   28  28  0   1
file2   98  71  0   0
file3   28  22  1   0
file4   37  39  0   1
file5   7   36  0   1

Can you let me know how to have scores for file1 .. file5 in both cases? Is that

(#positive - #negative) / #all in case 1 file2, (71-98)/(71+98)=-27/169= - 0.15 ?

what about case 2?

Thanks a lot.

A


Solution

  • If you consider neg_positive as negative, and neg_negative as positive, then you could create your index by combining the pairs of columns. This is plausible because the "neg positive" for instance contains sequences such as "not good".

    (rowSums(object[, c("negative", "neg_positive")]) -
        rowSums(object[, c("positive", "neg_negative")])) / rowSums(object) * 100
    

    Another (better) measure is the logit scale described in 2011. William Lowe, Kenneth Benoit, Slava Mikhaylov, and Michael Laver. "Scaling Policy Preferences From Coded Political Texts." Legislative Studies Quarterly 26(1, Feb): 123-155. This is the log(positive/negative) or

    log( rowSums(object[, c("positive", "neg_negative")]) /
         rowSums(object[, c("negative", "neg_positive")]) )