Search code examples
ralgorithmnlp

Fast way to search for a series of keywords among articles


To illustrate with an example:

I have a few keywords (case sensitive).

 kw <- c("American Express", "Inc said")

I have quite a few articles.

 data("acq")
 dv <- sapply(1:length(acq),function(x) acq[[x]]$content) #doing data transformation so that dv is just a vector of strings

I want the following table as an output

temp <- sapply(1:length(kw),function(x) stringr::str_detect(dv,kw[x]))

The problem is, I have millions of records and the method that I am using is not efficient enough.


Solution

  • What about parallelizing? This is an example based on your code:

    library(parallel)
    
    n_cores <- 2 # number of cores for parallel processing
    cl <- makeCluster(n_cores)
    emp <- parSapply(cl, 1:length(acq), FUN=function(x,i) str_detect(acq[[x]]$content,kw[I]))
    stopCluster(cl)