I'm trying to combine two words into one using the content_transform
function as part of tm
package in R.
For example, I've got location data and to create word clouds I need to combine "san jose", "san diego", "san francisco") otherwise "san" comes up as the most frequent word.
As far as I've gotten is creating a function, for example,
combineUK <- content_transformer(function(x, pattern)
gsub(pattern,"UK",x,ignore.case = T))
However, creating functions for each town separately is unrealistic.
I was wondering whether there's any way I can implement the paste()
function within content_transform
?
So, perhaps I'm missing something obvious.
Since you did not provide a full reproducible example (copy-paste-run-able), I don't know what you got and what you want. However, consider for example
library(tm)
library(wordcloud)
par(mfrow = c(2,1), cex=.5)
txt <- c("hello san jose dudes", "welcome to san diego", "Did you like san francisco")
corp <- Corpus(VectorSource(txt))
wordcloud(corp, min.freq=1)
corp <- tm_map(corp, content_transformer(function(x) gsub("(san).(\\w+)", "\\1\\2", x, ignore.case = TRUE)))
wordcloud(corp, min.freq=1)