Search code examples
raggregateuniquelapply

unique words by group


this is my example dataframe

example = data.frame(group = c("A", "B", "A", "A"), word = c("car", "sun ,sun, house", "car, house", "tree"))

I would like to get only unique words within group and through groups

So I would like to get this

group   word
A       car, tree
B       sun

I used aggregate and get this

aggregate(word ~ group , data = example,  FUN = paste0) 

  group                  word
1     A car, car, house, tree
2     B       sun ,sun, house

but now i need to select only unique values, but even this does not work out

for (i in 1:nrow(cluster)) {cluster[i, ][["word"]] = lapply(unlist(cluster[i, ][["word"]]), unique)}

with

Error in `[[<-.data.frame`(`*tmp*`, "word", value = list("car", "car, house",  : 
  replacement has 3 rows, data has 1

Solution

  • A base R option using aggregate + subset + ave like below

    with(
      aggregate(
        word ~ .,
        example,
        function(x) {
          unlist(strsplit(x, "[, ]+"))
        }
      ),
      aggregate(
        . ~ ind,
        subset(
          unique(stack(setNames(word, group))),
          ave(seq_along(ind), values, FUN = length) == 1
        ),
        c
      )
    )
    

    gives

      ind    values
    1   A car, tree
    2   B       sun