Search code examples
rword-cloudwordcloud2

Conditional wordcloud R


I have been searching for a few hours now and I’m very close but I just can’t get it to work. Basically, I have a word frequency that I want to use to build a word cloud. However, I would like to add some meaning to the colours plotted. For that reason I’ve added to my data.frame a third column that would condition the colours to be used in the wordcloud.

In the example below you will see that column “diff” is the difference in population between each city a threshold (6).

I would like the green and red to reflect the size of the difference between the population in each city and the threshold (that is working thanks to the post here) the tricky bit is that I would like city’s with the population equal to the threshold to have a specific colour (grey, "#c5c5c5") and that I just can’t do .

library(wordcloud)
library(tm)

DF <- data.frame(
city = c("New York","Barcelona","Paris","Rome","London", "Brussels", "Leeds", "Berlin"),
pop = c(12,7,5,7,6,2,0.8,6),
diff= c(6,1,-1,1,0,-4,-5.2,0))


custColorPal <- colorRampPalette(c("#ff0000","#00cc00"))

color_range_number <- length(unique(DF$diff))

colors <- custColors[factor(DF$diff)]
custColors <- custColorPal(color_range_number)

wordcloud(DF$city, DF$pop, colors=custColors, min.freq = 0.1, ordered.colors=FALSE)

In the example above I would expect two city’s to be grey, three to be green and three to be red.

Second attempt: I have managed(with the help of jazzurro) to colour the cities names that have the pop equal to the threshold grey. However, if you run the code below you will see something odd. Basically, we should only get one red city name and now we have several (I've change the initial values to test it). I understand that the gradient is evenly distributed but if one stretches the values in one direction it just does not work.

Is there a way to use two gradients at the same time? One for greater than and another for less than zero(or any other value)?

DF <- data.frame(
  city = c("New York","Barcelona","Paris","Rome","London", "Brussels", "Leeds", "Berlin"),
  pop = c(12,7,5,7,6,2,0.8,6),
  diff= c(20,1,10,1,0,7,-0.2,0))
DF$city<-as.character(DF$city)

custColorPal <- colorRampPalette(c("#ff0000","#00cc00"))
color_range_number <- length(unique(DF$diff))
custColors <- custColorPal(color_range_number)
colors <- custColors[factor(DF$diff)]

DF<-cbind(DF,colors)

DF$colors<-as.character(DF$colors)

DF<-transform(DF, colors = case_when(
  diff == 0 ~ "#c5c5c5", 
  TRUE   ~ colors
))

wordcloud(DF$city, DF$pop, colors=DF$colors, min.freq = 0.1, ordered.colors=TRUE)

Thanks in advance for any pointers

Cheers


Solution

  • Given your comment, I came up with the following idea. I do not know your actual data. You still need to consider how to adjust this code. I modified your original DF; I changed values in diff. In the present data, max value is 90 and min value is -95. First I created colors for 0-100 using colorRampPalette(). Similarly I created colors for -1 to -100. I combined the two vectors. Note that gray appears twice. That is why you see [-1] in the line for mycolors. You need to think how you would need to create colors based on your actual data. Once the colors are ready, I created a new column in the data set. Basically, I am using diff to identify an index number of a color in case_when. Finally, I drew the wordcloud. I hope you can adjust this code for your own data.

    library(tidyverse)
    library(wordcloud)
    
    DF <- data.frame(city = c("New York","Barcelona","Paris","Rome","London", "Brussels", "Leeds", "Berlin"),
                     pop = c(12, 7, 5, 7, 6, 2, 0.8, 6),
                     diff = c(60, 20, -30, 90, 0, -10, -95, 0))
    
    # Create gradient colors for positive and negative numbers.
    
    positive_color_palette <- colorRampPalette(colors = c("green", "gray"), space = "Lab")(100)
    negative_color_palette <- colorRampPalette(colors = c("gray", "red"), space = "Lab")(101)
    
    mycolors <- c(positive_color_palette, negative_color_palette[-1])
    
    # Color index begins with the highest value (100) to the lowest (-100).
    # Gray colors is at the 100th position in mycolors
    # Assign colors based on this knowledge.
    
    mutate(DF,
           colors = case_when(100 + diff > 100 ~ mycolors[100 - diff],
                              100 + diff < 100 ~ mycolors[100 - diff],
                              100 + diff == 100 ~ mycolors[100])) -> res
    
    
    wordcloud(words = res$city, freq = res$pop, colors = res$colors,
              min.freq = 0.1, random.order = FALSE, ordered.colors = TRUE)
    

    enter image description here