I have data that looks like
df <- data.frame(A = c("a, a, a, b, b, c, c", "a, a, b, b, b, b, c", "a, a, b, b"), B = c(3, 5, 8))
I want to find the most common word, separated by ,
for each observation of variable A
.
All approaches I have found only extract the most common word in the entire column, such as
table(unlist(strsplit(df$A,", "))) %>% which.max() %>% names()
and I get
wrong_result <- data.frame(A = c("a, a, a, b, b, c, c", "a, a, b, b, b, b, c"), B = c(3, 5, 8), C = c("b", "b", "b"))
If two words are equally frequent they should both be extracted. The result should look like
result <- data.frame(A = c("a, a, a, b, b, c, c", "a, a, b, b, b, b, c", "a, a, b, b"), B = c(3, 5, 8), C = c("a", "b", "a, b"))
You can do:
library(dplyr)
library(stringr)
library(purrr)
df %>%
mutate(maxi = map(str_split(A, pattern = ", "),
~ toString(names(which(table(.x) == max(table(.x)))))))
# A B maxi
#1 a, a, a, b, b, c, c 3 a
#2 a, a, b, b, b, b, c 5 b
#3 a, a, b, b 8 a, b