Search code examples
rdetect

How can I search for the variation of a word without typing all variations ( in R)?


I need to check whether the variation of a word is in the text? How can I do that without typing everything out? For example, I need to search for the word 'broken', is there a way in r where it can look for the word and other variations?

a="Broken flask"
b="fragmented flask"
c="broke glass"
d="shattered glass"
e="break flask"
text=c(a,b,c,d,e)
str_detect(tolower(text),"broken|fragmented|broke|break|shatter|shattered")

Solution

  • You could check out syn from the syn package, which generates synonyms for a given word, allowing you to do:

    library(syn)
    
    grepl(paste0(c("broken", syn("broken")), collapse = "|"), text, ignore.case = T)
    #> [1]  TRUE  TRUE  TRUE  TRUE FALSE
    

    It picked up 4 out of 5 here, without having to program any variations.