I am new in sentiment analysis. Quanteda examples show how to output numbers of positive and negative words. I tested some documents. It output below:
Case 1
document negative positive
file1 28 28
file2 98 71
file3 28 22
file4 37 39
file5 7 36
or below
Case 2
document negative positive neg_positive neg_negative
file1 28 28 0 1
file2 98 71 0 0
file3 28 22 1 0
file4 37 39 0 1
file5 7 36 0 1
Can you let me know how to have scores for file1 .. file5 in both cases? Is that
(#positive - #negative) / #all in case 1 file2, (71-98)/(71+98)=-27/169= - 0.15 ?
what about case 2?
Thanks a lot.
A
If you consider neg_positive
as negative
, and neg_negative
as positive, then you could create your index by combining the pairs of columns. This is plausible because the "neg positive" for instance contains sequences such as "not good".
(rowSums(object[, c("negative", "neg_positive")]) -
rowSums(object[, c("positive", "neg_negative")])) / rowSums(object) * 100
Another (better) measure is the logit scale described in 2011. William Lowe, Kenneth Benoit, Slava Mikhaylov, and Michael Laver. "Scaling Policy Preferences From Coded Political Texts." Legislative Studies Quarterly 26(1, Feb): 123-155. This is the log(positive/negative) or
log( rowSums(object[, c("positive", "neg_negative")]) /
rowSums(object[, c("negative", "neg_positive")]) )