R dictionary: create a many-to-one mapping

Consider the following MWE in a text mining exercise, using R{tm}: Toyota has several SUV models in the US.models<-c("highlander","land cruiser","rav4","sequoia","4runner"). The general media refers to these not as "toyota rav4" (corpus already transformed to lower case) but as "rav4". To get a single column of toyota suvs in a DocumentTermMatrix, i need to convert all these brands into one generic "toyota_suv". What I am doing now is to repeat mycorpus<-tm_map(mycorpus, gsub, pattern="rav4", replacement="toyota_suv") for length(models). A hack would be to set up model_names<-rep("toyota_suv",length(models)) and get on with life. How can I set up a dictionary with many-to-one mapping, so that all models are replaced with 'toyota_suv' in one expression? Many thanks.

Solution

You can use a vectorized substitution function. The stringi package offers such a function with the stri_replace_all family of functions. Here, I'm using stri_replace_all_fixed, but adjust case sensitivity and other options as needed.

library(tm)
library(stringi)

toyota_suvs <- c("highlander","land cruiser","rav4","sequoia","4runner")

tm_map(toyCorp, stri_replace_all_fixed,
    pattern = toyota_suvs, replacement = "toyota_suv",
    vectorize_all = FALSE)

data:

toyExample <- c("you don't know about the rav4, John Snow",
    "the highlander is a great car",
    "I want a land cruiser")

toyCorp <- Corpus(VectorSource(toyExample))