Search code examples
rsapply

Problems creating a key value store in R


I'm trying to create a key value store with the key being entities and the value being the average sentiment score of the entity in news articles.

I have a dataframe containing news articles and a list of entities called organizations1 indentified in those news articles by a classifier. The first rows of the organization1 list contains the entities identified in the article on the first row of the news_us dataframe. I'm trying to iterate through the organizations list and creating a key value store with the key being the entity in the organization1 list and the value being the sentiment score of the news description in which the entity was mentioned. The code I have doesn't change the scores in the sentiment list and I don't know why. My first guess was that I would have to use the $ operator on the sentiment list to add the value but that didn't change anything either. Here is the code I have so far:

library(syuzhet)
sentiment <- list()
organization1 <- list(NULL, "US", "Bath", "Animal Crossing", "World Health Organization", 
    NULL, c("Microsoft", "Facebook"))
news_us <- structure(list(title = c("Stocks making the biggest moves after hours: Bed Bath & Beyond, JC Penney, United Airlines and more - CNBC", 
"Los Angeles mayor says 'very difficult to see' large gatherings like concerts and sporting events until 2021 - CNN", 
"Bed Bath & Beyond shares rise as earnings top estimates, retailer plans to maintain some key investments - CNBC", 
"6 weeks with Animal Crossing: New Horizons reveals many frustrations - VentureBeat", 
"Timeline: How Trump And WHO Reacted At Key Moments During The Coronavirus Crisis : Goats and Soda - NPR", 
"Michigan protesters turn out against Whitmer’s strict stay-at-home order - POLITICO"
), description = c("Check out the companies making headlines after the bell.", 
"Los Angeles Mayor Eric Garcetti said Wednesday large gatherings like sporting events or concerts may not resume in the city before 2021 as the US grapples with mitigating the novel coronavirus pandemic.", 
"Bed Bath & Beyond said that its results in 2020 \"will be unfavorably impacted\" by the crisis, and so it will not be offering a first-quarter nor full-year outlook.", 
"Six weeks with Animal Crossing: New Horizons has helped to illuminate some of the game's shortcomings that weren't obvious in our first review.", 
"How did the president respond to key moments during the pandemic? And how did representatives of the World Health Organization respond during the same period?", 
"Many demonstrators, some waving Trump campaign flags, ignored organizers‘ pleas to stay in their cars and flooded the streets of Lansing, the state capital."
), name = c("CNBC", "CNN", "CNBC", "Venturebeat.com", "Npr.org", 
"Politico")), na.action = structure(c(`35` = 35L, `95` = 95L, 
`137` = 137L, `154` = 154L, `213` = 213L, `214` = 214L, `232` = 232L, 
`276` = 276L, `321` = 321L), class = "omit"), row.names = c(NA, 
6L), class = "data.frame")
i = as.integer(0)
for(index in organizations1){
  i <- i+1
   if(is.character(index)) { #if entity is not null/NA
     val <- get_sentiment(news_us$description[i], method = "afinn")
     #print(val)
     print(sentiment[[index[1]]])
     sentiment[[index[1]]] <- sentiment[[index[1]]]+val
   }
}

Here is the sentiment list after running the above code chunk:

$US
integer(0)

$Bath
integer(0)

$`Animal Crossing`
integer(0)

$`World Health Organization`
integer(0)

$`Apple TV`
integer(0)

$`Pittsburgh Steelers`
integer(0)

Whereas I would like it to look something like:

$US
1.3

$Bath
0.3

$`Animal Crossing`
2.4

$`World Health Organization`
1.2

$`Apple TV`
-0.7

$`Pittsburgh Steelers`
0.3

The value column can have multiple values for multiple entities identified in the article.


Solution

  • I am not sure how organization1 and news_us$description are related but perhaps, you meant to use it something like this?

    library(syuzhet)
    
    setNames(lapply(news_us$description, get_sentiment), unlist(organization1))
    
    #$US
    #[1] 0
    
    #$Bath
    #[1] -0.4
    
    #$`Animal Crossing`
    #[1] -0.1
    
    #$`World Health Organization`
    #[1] 1.1
    
    #$Microsoft
    #[1] -0.6
    
    #$Facebook
    #[1] -1.9