Find most common word(s) in character string value

I have data that looks like

df <- data.frame(A = c("a, a, a, b, b, c, c", "a, a, b, b, b, b, c", "a, a, b, b"), B = c(3, 5, 8))

I want to find the most common word, separated by , for each observation of variable A.

All approaches I have found only extract the most common word in the entire column, such as

table(unlist(strsplit(df$A,", "))) %>% which.max() %>% names()

and I get

wrong_result <- data.frame(A = c("a, a, a, b, b, c, c", "a, a, b, b, b, b, c"), B = c(3, 5, 8), C = c("b", "b", "b"))

If two words are equally frequent they should both be extracted. The result should look like

result <- data.frame(A = c("a, a, a, b, b, c, c", "a, a, b, b, b, b, c", "a, a, b, b"), B = c(3, 5, 8), C = c("a", "b", "a, b"))

Solution

You can do:

library(dplyr)
library(stringr)
library(purrr)
df %>% 
  mutate(maxi = map(str_split(A, pattern = ", "), 
                    ~ toString(names(which(table(.x) == max(table(.x)))))))

#                    A B maxi
#1 a, a, a, b, b, c, c 3    a
#2 a, a, b, b, b, b, c 5    b
#3          a, a, b, b 8 a, b