A small data frame:
words <- data.frame(terms = c("qhick brown fox",
"tom dick harry",
"cats dgs"))
If I use qdap::which_misspelled
I can find out missspelled words:
> which_misspelled(words)
1 8
"qhick" "dgs"
But what I want to do is to subset words df on the rows that contain misspelling. The above returns index 1 and 8 referring to all words provided in my df, regardless of which row.
How can I subset my df based on any rows that contain misspelled words?
(Bonus if can be done with dplyr filter)
How about just use check_spelling
which is vectorized, and the result contains a column of row numbers you can use to subset the data frame:
library(qdap)
words[check_spelling(words$terms)$row,,drop=F]
# terms
#1 qhick brown fox
#3 cats dgs
The which_misspelled
function seems meant to check for a single string instead of a data frame:
which_misspelled - Check the spelling for a string.