Search code examples
rdplyrqdap

dplyr filter using qdap::which_misspelt OR dplyr filter with a nested function


A small data frame:

words <- data.frame(terms = c("qhick brown fox",
          "tom dick harry", 
          "cats dgs"))

If I use qdap::which_misspelled I can find out missspelled words:

> which_misspelled(words)
      1       8 
"qhick"   "dgs" 

But what I want to do is to subset words df on the rows that contain misspelling. The above returns index 1 and 8 referring to all words provided in my df, regardless of which row.

How can I subset my df based on any rows that contain misspelled words?

(Bonus if can be done with dplyr filter)


Solution

  • How about just use check_spelling which is vectorized, and the result contains a column of row numbers you can use to subset the data frame:

    library(qdap)
    words[check_spelling(words$terms)$row,,drop=F]
    
    #            terms
    #1 qhick brown fox
    #3        cats dgs
    

    The which_misspelled function seems meant to check for a single string instead of a data frame:

    which_misspelled - Check the spelling for a string.