Search code examples
rnlptext-miningstring-matchinglemmatization

How can I lemmatize english words (example: 'run' and 'ran') using R to bring them all to the same tense?


I want to lemmatize english words such that all of them get converted to the same tense. For example:

c("ran","run","running") 

should become c("run","run","run").

I have already explored R packages such as tm, wordnet, RTextTools, and Snowball C; but all of them result in the output c("ran","run","run"). As you can see, they do not convert "ran" to "run".


Solution

  • Have a look at the textstem package I maintain:

    if (!require("pacman")) install.packages("pacman")
    pacman::p_load(textstem)
    
    lemmatize_words(c("ran","run","running"))
    ###[1] "run" "run" "run"
    

    Note that if you actually have strings and not word vectors you may want the lemmatize_strings function instead.