Search code examples
rtmstemming

Use double stemming for two languages in R


In my corpus there are two languages russian and english.

Hello, how are you
Привет, как дела

Can i use double stemming for two languages somthing like this

tw.corpus <- tm_map(tw.corpus,stemDocument,  c("russian","english"))

Or it is need another approach?


Solution

  • stemDocument only takes into account the first argument of your vector of languages. Your present code will only stem Russian (and not English).

    To perform a double stemming, you simply need to perform the stemming twice (one per language).

    tw.corpus <- tm_map(tw.corpus,stemDocument,  c("russian"))
    tw.corpus <- tm_map(tw.corpus,stemDocument,  c("english"))