Search code examples
rtextmininghunspell

Combine lists of strings of different lengths to a data frame


I have a text data that need correcting of English mistakes.

I want an output of a table, first column are the mistakes, and all suggestions for correction in the second column.

For example:

sentence <- "This is a word but thhis isn't and this onne as well. I need hellp"

library(hunspell)
mistakesList <- hunspell(essay)[[1]]
suggestionsList <- hunspell_suggest(mistakesList)

I've tried

do.call(rbind, Map(data.frame, A=mistakesList, B=suggestionsList))

but it returns

            A      B
thhis   thhis   this
onne.1   onne   none
onne.2   onne    one
onne.3   onne  tonne
onne.4   onne  Donne
onne.5   onne   once
onne.6   onne   Anne
onne.7   onne Yvonne
hellp.1 hellp  hello
hellp.2 hellp   hell
hellp.3 hellp   help
hellp.4 hellp hell p

I want a data frame that returns :

mistakes suggestions
thhis   this
onne    none one tonne Donne once Anne Yvonne
hellp   hello hell help hell p

Solution

  • We could keep mistakesList as it is and convert the suggestionsList to comma-separated values using toString.

    data.frame(mistakes = mistakesList, suggestions = sapply(suggestionsList, toString))
    
    
    #  mistakes                               suggestions
    #1    thhis                                      this
    #2     onne none, one, tonne, Donne, once, Anne, neon
    #3    hellp                 hello, hell, help, hell p