I have two data frames as follows:
df1 <- c('apple','banana','cherry', 'melon', 'grape', 'fig', 'guava', 'strawberry', 'blueberry')
df2 <- c('apple','pineapple','kiwi', 'banana', 'orange', 'lemon', 'peach','avocado','pear','cherry','mango','coconut')
I want to draw a word cloud to compare the words from these two data frames, so that shared words like 'apple','banana','cherry' could be in the middle.
I hope to be able to receive some help with this. Many thanks!
If you assume that words in both vectors are unique, you could: create a word cloud in which words from df1 occur once, words from df2 occur twice and the shared three times. We can then color them differently and make the shared words appear the biggest and red. This is just the basic round shape, you can choose all sorts of shapes as per this vignette.
if (!require(wordcloud)) install.packages("wordcloud")
# your data
df1 <- c('apple','banana','cherry', 'melon', 'grape', 'fig', 'guava', 'strawberry', 'blueberry')
df2 <- c('apple','pineapple','kiwi', 'banana', 'orange', 'lemon', 'peach','avocado','pear','cherry','mango','coconut')
# Find unique and shared words
shared_words <- intersect(df1, df2)
unique_df1_words <- setdiff(df1, df2)
unique_df2_words <- setdiff(df2, df1)
# freq
freq_unique_df1 <- rep(1, length(unique_df1_words))
freq_unique_df2 <- rep(2, length(unique_df2_words))
freq_shared <- rep(3, length(shared_words))
set.seed(34)
wordcloud(
words = c(shared_words, unique_df1_words, unique_df2_words),
freq = c(freq_shared, freq_unique_df1, freq_unique_df2),
colors = c("darkblue", # df 1 color freq 1
"lightblue", # df 2 color freq 2
"red"), # most frequent are the shared words ;)
scale = c(3, 0.5), # Adjust word sizes
min.freq = 1,
max.words = 200,
random.order = FALSE,
rot.per = 0.0, # proportion of words with 90 degrees, yikes set it to 0
main = "Word Comparison: Shared and Unique Words"
)