Search code examples
rtext-processingdplyrcorpus

How to use the singularize function in mutate?


I have a text corpus and as part of the preprocessing I need to singularize all words.

Let's say we have a corpus of words:

corpus <- c("house", "friends", "cats", "dogs") %>% tibble(word = .)

If I apply the singularize function (SemNetCleaner) directly it works, however I'd need to use a slow for-loop to apply it to every row of my words column:

#install.packages("SemNetCleaner")
library(SemNetCleaner)

corpus[2,1] %>% unlist() %>% singularize()

  word 
"friend"

However, if I use it within a mutate it just binds all entries like the paste() function:

corpus %>% mutate(singular = singularize(word))

# A tibble: 4 x 2
  word    singular              
  <chr>   <chr>                 
1 house   house friends cats dog
2 friends house friends cats dog
3 cats    house friends cats dog
4 dogs    house friends cats dog

Solution

  • Use rowwise()

    corpus %>% rowwise() %>% mutate(singular = singularize(word))
    # A tibble: 4 x 2
    # Rowwise: 
      word    singular
      <chr>   <chr>   
    1 house   house   
    2 friends friend  
    3 cats    cat     
    4 dogs    dog