Search code examples
rnlppipetidyverse

Pipe Operators returning two rows for one comment


I am attempting to obtain sentiment scores on comments in a data frame with two columns, Author and Comment. I used the command

data %>%
  get_sentences() %>%
  sentiment() -> data_senti

to put the result into a variable. The result is more than doubled as each comment is broken down into n amount of rows for n amount of sentences in a comment. How do I keep a comment intact, as in, return only one row for each comment and not break it down to n rows for n sentences in a comment?

=> input example:

Author  Comment
Bob     "I love it. I do not hate it"

=> preferred output example

Author  Comment                          Sentiment
Bob     "I love it. I do not hate it"    0.02

what I actually get:

Author     Comment           Sentiment
Bob       "I love it"         1
Bob       "I do not hate it"  .8

Full code so far:

# load proper libraries; ensure they are already installed
install.packages(c('syuzhet', 'sentimentr', 'tidyverse', 'magrittr', 'dplyr'))
library(syuzhet)
library(sentimentr)
library(tidyverse)

# set working dir
setwd("C:/Users/user3/OneDrive/Desktop/Sentiment_Analysis/Data/Output/Clean Sets")

# import file
data <- read.csv("C:/Users/user3/OneDrive/Desktop/Sentiment_Analysis/Data/Output/Clean Sets/all_comments.csv")

data %>%
  get_sentences() %>%
  sentiment() -> data_senti

UPDATE: dput(head(data))

structure(list(Primary.Key = c("Google_e1", "Google_e3", "Google_e98" ), Original.Text = c("I feel as though awards are fairly awarded, but the time frame in which awards are given is sometimes significantly late due at times to admin. Not providing awards to personnel in a timely fashion can give the perception that the command does not care about members", "Our all white, all male, all 40 or 50-something leadership is horrible to women and minorities. Every single person that has had difficulties in the workplace has been a woman, a minority, or both. On the other hand, the white men (mickey, donald, and others) are allowed to be abusive, dismissive, and unprofessional. white women (Minnie, and Cardi B and Lovely Jun) are allowed to make a sport out of gossiping about people and essentially run Mordor like a remake of the Mean Girls movie. I honestly find it hard to believe that the chain of command has NEVER NOTICED that the people they force out of their jobs are Latina women, black men, and other minorities, and likewise don't notice that the people they give uncontested promotions to are white men (Lebron) and white women (Meghan The Horse). Honestly, for an analytical organization, it's hard to believe that nobody has ever seen it through the eyes of a minority. This is why representation matters!!!", "I have not observed clear indications of "better treatment" based off of Race, Religion, Sex or sexual prefrence." )), row.names = c(NA, 3L), class = "data.frame")


Solution

  • Welcome to SO, Père Noël. Pacakge {sentimenter}'s get_sentences() breaks the text input into sentences by default, as its name implies. To reconstruct the original text input as the defining key in your final data frame, you need to group and summarize the sentence-based output produced by sentiment(). In this example, I will simply average the sentiment scores, and append sentences by their element_id.

    library(sentimentr)
    library(tidyverse)
    
    df <- tibble(Primary.Key   = c("Google_e1", "Google_e3", "Google_e98" ), 
                 Original.Text = c("I feel as though awards are fairly awarded, but the time frame in which awards are given is sometimes significantly late due at times to admin. Not providing awards to personnel in a timely fashion can give the perception that the command does not care about members", "Our all white, all male, all 40 or 50-something leadership is horrible to women and minorities. Every single person that has had difficulties in the workplace has been a woman, a minority, or both. On the other hand, the white men (mickey, donald, and others) are allowed to be abusive, dismissive, and unprofessional. white women (Minnie, and Cardi B and Lovely Jun) are allowed to make a sport out of gossiping about people and essentially run Mordor like a remake of the Mean Girls movie. I honestly find it hard to believe that the chain of command has NEVER NOTICED that the people they force out of their jobs are Latina women, black men, and other minorities, and likewise don't notice that the people they give uncontested promotions to are white men (Lebron) and white women (Meghan The Horse). Honestly, for an analytical organization, it's hard to believe that nobody has ever seen it through the eyes of a minority. This is why representation matters!!!", "I have not observed clear indications of 'better treatment' based off of Race, Religion, Sex or sexual prefrence." ))
    
    
    df %>% 
      get_sentences() %>% 
      sentiment() %>% 
      group_by(Primary.Key, element_id) %>% 
      summarise(comment = paste(Original.Text, collapse = " "), 
                sentiment = mean(sentiment)) %>% 
      ungroup()
    
    # A tibble: 3 x 4
      Primary.Key element_id comment                                                sentiment
      <chr>            <int> <chr>                                                      <dbl>
    1 Google_e1            1 I feel as though awards are fairly awarded, but the t~   -0.0410
    2 Google_e3            2 Our all white, all male, all 40 or 50-something leade~   -0.0735
    3 Google_e98           3 I have not observed clear indications of 'better trea~   -0.0943