I am working on a function that will hopefully perform a sentiment analysis for each emotion in the NRC dictionary on a list (see: https://www.tidytextmining.com/sentiment.html#sentiment-analysis-with-inner-join), and then save the score itself as a variable in a dataframe or tibble. I've got the actual analysis part down, but saving it in the dataframe or tibble is not working.
#Creating List of All Emotions To Apply This To
emotion <- c('anger', 'disgust', 'joy', 'surprise', 'anticip', 'fear', 'sadness', 'trust')
#Initialize List with Length of Emotion Vector
wcount <- vector("list", length(emotion))
#Create Tibble for me to Deposit the Result Into
nrc_tib <-tibble(id="",
anger=numeric(0),
disgust=numeric(0),
joy=numeric(0),
surprise=numeric(0),
anticip=numeric(0),
fear=numeric(0),
sadness=numeric(0),
trust=numeric(0))
#Create Row to Deposit Variable Into
nrc_tib <-add_row(nrc_tib, 'id'="transcript1.txt")
#Defining Function
sentimentanalysis_nrc <- function(emoi) {
#Getting Sentiment, Filtering by Emotion in List
nrc_list <- get_sentiments("nrc") %>%
filter(sentiment == emoi)
#Conducting Sentiment Analysis, Saving Results
wcount[[emoi]] <- wordcount %>%
inner_join(nrc_list) %>%
count(word, sort = TRUE)
#Calculating Sentiment Score for Given Emotion
score <- sum(wcount[[emoi]]$n)
#Saving Emotion in nrc_tib, which is the part that doesn't work
nrc_tib$emoi <- score
}
#Running the Function
lapply(emotion, FUN = sentimentanalysis_nrc)
I've tried a few different things, including putting emoi in brackets in the line that doesn't work, and some googling suggests that isn't allowed. What would be allowed if I wanted to save it?
Note: If this helps for context...this example uses the file transcript1.txt, but my goal eventually is to generalize this across transcript2.txt-transcript45.txt, binding the scores for all 45 transcripts together afterwards.
EDIT: I came up with a clunky solution, using:
nrc_tib <<- replace(nrc_tib, emoi, score)
But there's got to be a better solution than that.
One of the big benefits of using tidy data principles is that problems like this become quite tractable! You can do this using joins.
I'll using Jane Austen's novels as examples since you didn't post example data. Think of each book as one of your transcripts. The first step is to tidy the text data using unnest_tokens()
.
library(tidyverse)
library(tidytext)
library(janeaustenr)
tidy_books <- austen_books() %>%
unnest_tokens(word, text)
tidy_books
#> # A tibble: 725,055 x 2
#> book word
#> <fct> <chr>
#> 1 Sense & Sensibility sense
#> 2 Sense & Sensibility and
#> 3 Sense & Sensibility sensibility
#> 4 Sense & Sensibility by
#> 5 Sense & Sensibility jane
#> 6 Sense & Sensibility austen
#> 7 Sense & Sensibility 1811
#> 8 Sense & Sensibility chapter
#> 9 Sense & Sensibility 1
#> 10 Sense & Sensibility the
#> # … with 725,045 more rows
Then you can perform the sentiment analysis using an inner_join()
. Notice that with this join, you will successfully match up each emotion with each word (the words are in this dataframe more than once, when appropriate).
tidy_books %>%
inner_join(get_sentiments("nrc"))
#> Joining, by = "word"
#> # A tibble: 177,363 x 3
#> book word sentiment
#> <fct> <chr> <chr>
#> 1 Sense & Sensibility sense positive
#> 2 Sense & Sensibility sensibility positive
#> 3 Sense & Sensibility long anticipation
#> 4 Sense & Sensibility respectable positive
#> 5 Sense & Sensibility respectable trust
#> 6 Sense & Sensibility general positive
#> 7 Sense & Sensibility general trust
#> 8 Sense & Sensibility good anticipation
#> 9 Sense & Sensibility good joy
#> 10 Sense & Sensibility good positive
#> # … with 177,353 more rows
Now you can count()
up the sentiment scores for each book (transcript in your case) and emotion/affect.
tidy_books %>%
inner_join(get_sentiments("nrc")) %>%
count(book, sentiment)
#> Joining, by = "word"
#> # A tibble: 60 x 3
#> book sentiment n
#> <fct> <chr> <int>
#> 1 Sense & Sensibility anger 1343
#> 2 Sense & Sensibility anticipation 3698
#> 3 Sense & Sensibility disgust 1172
#> 4 Sense & Sensibility fear 1861
#> 5 Sense & Sensibility joy 3364
#> 6 Sense & Sensibility negative 4005
#> 7 Sense & Sensibility positive 7429
#> 8 Sense & Sensibility sadness 2064
#> 9 Sense & Sensibility surprise 1589
#> 10 Sense & Sensibility trust 4222
#> # … with 50 more rows
You can even pipe straight to make a plot!
tidy_books %>%
inner_join(get_sentiments("nrc")) %>%
count(book, sentiment) %>%
ggplot(aes(sentiment, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~book, scales = "free_y") +
coord_flip()
#> Joining, by = "word"
Created on 2019-12-13 by the reprex package (v0.3.0)