Search code examples
rword-cloud

Word cloud comparison for two lists of words


I have two data frames as follows:

df1 <- c('apple','banana','cherry', 'melon', 'grape', 'fig', 'guava', 'strawberry', 'blueberry')
df2 <- c('apple','pineapple','kiwi', 'banana', 'orange', 'lemon', 'peach','avocado','pear','cherry','mango','coconut')

I want to draw a word cloud to compare the words from these two data frames, so that shared words like 'apple','banana','cherry' could be in the middle.

I hope to be able to receive some help with this. Many thanks!


Solution

  • If you assume that words in both vectors are unique, you could: create a word cloud in which words from df1 occur once, words from df2 occur twice and the shared three times. We can then color them differently and make the shared words appear the biggest and red. This is just the basic round shape, you can choose all sorts of shapes as per this vignette.

    out

    Code

    if (!require(wordcloud)) install.packages("wordcloud")
    
    # your data
    df1 <- c('apple','banana','cherry', 'melon', 'grape', 'fig', 'guava', 'strawberry', 'blueberry')
    df2 <- c('apple','pineapple','kiwi', 'banana', 'orange', 'lemon', 'peach','avocado','pear','cherry','mango','coconut')
    
    # Find unique and shared words
    shared_words <- intersect(df1, df2)
    unique_df1_words <- setdiff(df1, df2)
    unique_df2_words <- setdiff(df2, df1)
    
    # freq
    freq_unique_df1 <- rep(1, length(unique_df1_words))
    freq_unique_df2 <- rep(2, length(unique_df2_words))
    freq_shared <- rep(3, length(shared_words))
    
    set.seed(34)
    wordcloud(
      words = c(shared_words, unique_df1_words, unique_df2_words),
      freq = c(freq_shared, freq_unique_df1, freq_unique_df2),
      colors = c("darkblue", # df 1 color freq 1
                 "lightblue", # df 2 color freq 2
                 "red"), # most frequent are the shared words ;)
      scale = c(3, 0.5),  # Adjust word sizes
      min.freq = 1,
      max.words = 200,
      random.order = FALSE,
      rot.per = 0.0, # proportion of words with 90 degrees, yikes set it to 0
      main = "Word Comparison: Shared and Unique Words"
    )