Search code examples
rlisthunspell

Hunspell Workaround Empty Suggestion out of bounds error in R


I am trying to automatically spell-check a string column of a data.table/data.frame.

Looking around, I found several approaches that all give an "out of bounds" error in the case hunspell.suggest returns no suggestions (that is, an empty list, e.g. "pippasnjfjsfiadjg"), see approaches here (the accepted answer here yields NA so does work in principal) and here

We seem to require unlist in order to identify these empty suggestions and then exclude them from the part of the code that picks the first suggestion but I cannot figure out how.

library(dplyr)
library(stringi)
library(hunspell)

df1 <- data.frame("Index" = 1:7, "Text" = c("pippasnjfjsfiadjg came to dinner with us tonigh.",
                                            "Wuld you like to trave with me?",
                                            "There is so muh to undestand.",
                                            "Sentences cone in many shaes and sizes.",
                                            "Learnin R is fun",
                                            "yesterday was Friday",
                                            "bing search engine"),
                  stringsAsFactors = FALSE)

# Get bad words.
badwords <- hunspell(df1$Text) %>% unlist

# Extract the first suggestion for each bad word.
suggestions <- sapply(hunspell_suggest(badwords), "[[", 1)

mutate(df1, Text = stri_replace_all_fixed(str = Text,
                                          pattern = badwords,
                                          replacement = suggestions,
                                          vectorize_all = FALSE)) -> out

Solution

  • You'll want to filter the list of bad words and suggestions to get rid of those without suggestions

    badwords <- hunspell(df1$Text) %>% unlist()
    # note use of '[' rather than '[['
    suggestions <- sapply(hunspell_suggest(badwords), '[', 1)
    
    badwords <- badwords[!is.na(suggestions)]
    suggestions <- suggestions[!is.na(suggestions)]