Search code examples
rsentiment-analysisword-cloudquanteda

Sentiment wordcloud using R's quanteda?


I have a set of reviews (comment in words + rating from 0-10) and I want to create a sentiment word cloud in R, in which:

  • A word's size represents its frequency
  • A word's color represents the average rating of all reviews it occurs in (preferably a color gradient green-yellow-red)

I used quanteda to create a dfm of the comments. Now I think I want to use the textplot_wordcloud function and I guess I need to do the following:

  1. For each word, get all the reviews it appeared in
  2. Calculate the average rating of this subset of reviews
  3. Divide by 10 to scale to 0-1 and assign this value to this word
  4. Sort the words by average rating (so that the colors are assigned correctly?)
  5. Use color=RColorBrewer::brewer.pal(11, "RdYlGn") to calculate colors from the average ratings

I'm having trouble with step 1 and 4. The rest should be doable. Can somebody explain how a dfm can be read manipulated easily?


Solution

  • I found an efficient way to do this using matrix multiplication: basically the functionality is sw = sd * C / Nw, where:

    • sw = sentiment per word
    • sd = ratings per document
    • C = per-document word frequency matrix
    • Nw = number of occurences per word

    In code:

    # create the necessary variables
    sd <- as.integer(df$rating)
    C <- as.matrix(my_dfm)
    Nw <- as.integer(colSums(C))
    
    # calculate the word sentiment
    sw <- as.integer(s_d %*% C) / n_w
    
    # normalize the word sentiment to values between 0 and 1
    sw <- (sw - min(sw)) / (max(sw) - min(sw)
    
    # make a function that converts a sentiment value to a color
    num_to_color <- seq_gradient_pal(low="#FF0000", high="#00FF00")
    
    # apply the function to the sentiment values
    word_colors <- num_to_color(sw)
    
    # create a new window; 
    # before executing the next command, manually maximize in order to get a better readable wordcloud
    dev.new()
    
    # create the wordcloud with the calculated color values
    textplot_wordcloud(my_dfm, color=word_colors)