Search code examples
rcolorstext-miningword-cloudcolor-palette

Wordcloud showing colour based on continuous metadata in R


I'm creating a wordcloud in which the size of the words is based on frequency, but i want the colour of the words to be mapped to a third variable (stress, which is the amount of stress associated with each word, a numerical or continuous variable).

I tried the following, which gave me only two different colours (yellow and purple) while i want something more smooth. I would like some color range like a palette that goes from green to red for example.

df = data.frame(word = c("calling", "meeting", "conference", "contract", "negotiation", "email"),
n = c(20, 12, 4, 8, 10, 43),
stress = c(23, 30, 15, 40, 35, 15))
df = tbl_df(df) 
wordcloud(words = df$word, freq = df$n, col = df$stress)

Does anyone know how to deal with this continous metadata and get some smoothly changing colour for the words when stress goes up? Thanks!


Solution

  • Here is a potential solution. You want to use the wordcloud2 package for your task. Then, you can solve your issue, I suppose. Since I do not know your real data, I created a sample data to demonstrate a prototype.

    If you have many words, I am not sure if adding colors with a continuous variable (stress) is a good idea. One thing you could do is to create a new group variable using cut(). In this way, you can reduce the numbers of colors you would use in your graphics. Here, I created a new column called color with five colors from the viridis package.

    When you use wordcloud2(), you have only two things to supply. One is data and the other is color. Font size reflects frequency of the words without specifying it.

    mydf = data.frame(word = c("calling", "meeting", "conference", "contract", "negotiation",
                               "email", "friends", "chat", "text", "deal",
                               "business", "promotion", "discount", "users", "family"),
                      n = c(20, 12, 4, 8, 10, 43, 33, 5, 47, 28, 12, 9, 50, 31, 22),
                      stress = c(23, 30, 15, 40, 35, 15, 30, 18, 10, 5, 29, 38, 45, 8, 3))
    
    
              word  n stress
    1      calling 20     23
    2      meeting 12     30
    3   conference  4     15
    4     contract  8     40
    5  negotiation 10     35
    6        email 43     15
    7      friends 33     30
    8         chat  5     18
    9         text 47     10
    10        deal 28      5
    11    business 12     29
    12   promotion  9     38
    13    discount 50     45
    14       users 31      8
    15      family 22      3
    
    library(dplyr)
    library(wordcloud2)
    library(viridis)
    
    mutate(mydf, color = cut(stress, breaks = c(0, 10, 20, 30, 40, Inf),
                 labels = c("#FDE725FF", "#73D055FF", "#1F968BFF",
                            "#2D708EFF", "#481567FF"),
                 include.lowest = TRUE)) -> temp
    
    wordcloud2(data = temp, color = temp$color)
    

    enter image description here