sentimentr - different results for different text partitioning

Using sentimentr to analyse the text:

I haven’t been sad in a long time. I am extremely happy today. It’s a good day.

I first used a sentence by sentence partitioning of the text

library(sentimentr)

ase1 <- c(
  "I haven't been sad in a long time.",
  "I am extremely happy today.",
  "It's a good day."
)

part1 <- get_sentences(ase1)
sentiment(part1)

   element_id sentence_id word_count sentiment
1:          1           1          8 0.1767767
2:          2           1          5 0.6037384
3:          3           1          4 0.3750000

then used one block of text

ase2 <- c(
  "I haven’t been sad in a long time. I am extremely happy today. It’s a good day.")

part2 <- get_sentences(ase2)
sentiment(part2)

   element_id sentence_id word_count   sentiment
1:          1           1          9 -0.03333333
2:          1           2          5  0.60373835
3:          1           3          5  0.33541020

Same text, difference in word count and in sentiment score

Please advise?

Solution

Not completely the same text. In the first example you use ', but in the second text you use ’. These are completely different quotes and have different meaning in text mining.

The example below returns the same results as in your first example.

ase2 <- c(
  "I haven't been sad in a long time. I am extremely happy today. It's a good day.")

part2 <- get_sentences(ase2)
sentiment(part2)
   element_id sentence_id word_count sentiment
1:          1           1          8 0.1767767
2:          1           2          5 0.6037384
3:          1           3          4 0.3750000