I am trying to build an Shiny App that can dynamically display sentences from a database column by matching a Corpus from a text box , ie. as users starts typing the text in the text box, all the sentences that would match (corpus from the text typed) need to be displayed by order of number of words that that matchs the corpus
I tried kwic
function but that is not helping match corpus dynamically, approach that I tried,
require(quanteda)
require(tm)
data(crude, package = "tm")
mycorpus <- corpus(crude)
kwic(mycorpus, "company") # Pass the words from the text box corpus
request help...
I think what you're asking for is,
table(kwic(mycorpus, phrase, join = FALSE)$keyword)
where phrase
just gets lengthened as more terms get typed in. (Requires quanteda >= 0.99
, which also includes the phrase
function which might be useful here.) For a more general match, you could convert both the corpus and all entered terms (in an ever-lengthening phrase
) into tokenized wordstems
mystems <- corpus(crude) %>% texts() %>% tokens() %>% tokens_wordstem()
phrase <- tokens(phrase, remove_punct = TRUE, remove_symbols = TRUE) %>%
tokens_wordstem(language = "greek") %>% # or whatever
as.character()
Then table(kwic(mystems, phrase, join = FALSE)$keyword)
should do same thing but matching word stems only, rather than exact words. If you want numbers of words that match each document, then a *apply
-type wrapper (or purrr::map()
) will also extract that.