I have created two corpuses: one containing tweet texts and another containing company names. What I'm trying to do is find which companies are mentioned in tweets.
Example document of a tweet:
> writeLines(as.character(tweet_corp[[175]]))
general motor send mexican made model chevi cruze us car dealer tax free across border make usaor pay big border tax
Example document of a company:
> writeLines(as.character(company_corp[[1397]]))
general motor
I would like an output that matches tweet_corp[[175]] with company_corp[[1397]]. Is there any way to do this?
You could use the stringr
package to check whether a company name occurs in a tweet, e.g.
library(stringr)
company_name <- "general motor"
tweet <- "general motor send mexican made model chevi cruze us car dealer tax free across border make usaor pay big border tax"
# check whether a company name occurs in a string
str_detect(
string = tweet,
pattern = coll(company_name)
)