I have a dataset all_transcripts
in which I have one column ConvID
and a column name
:
>all_transcripts
ConvID Name
5 Guest
5 Guest
5 Agent
5 Guest
5 Agent
6 Reception
6 Guest
6 Agent
6 Guest
6 Guest
7 Reception
7 Reception
7 Guest
7 Guest
7 Reception
8 Reception
8 Guest
8 Agent
I want to get the unique names per ConvID
My desired output looks like:
5 ['Guest','Agent']
6 ['Reception','Guest','Agent']
7 ['Reception','Guest']
8 ['Reception','Guest','Agent']
To do so, I tried the aggregate function as follows:
aggregate(interactionId~name, all_transcripts, FUN= 'unique')
But this does not work. How can I change my code so I get the desired output?
tidyverse
solution.The difference here is that the nesting gives a list-column back as opposed to a character vector column. Depending on your needs this may or may not be better.
library(tidyverse, warn.conflicts = FALSE)
all_transcripts %>%
nest(-ConvID) %>%
mutate(unique_names = map(data, ~ unique(.[, "Name", drop = TRUE]))) %>%
select(-data)
#> ConvID unique_names
#> 1 5 Guest, Agent
#> 2 6 Reception, Guest, Agent
#> 3 7 Reception, Guest
#> 4 8 Reception, Guest, Agent
data.table
solutionlibrary(data.table)
setDT(all_transcripts)
all_transcripts[, .(unique_names = list(unique(Name))) , by = ConvID]
#> ConvID unique_names
#> 1: 5 Guest,Agent
#> 2: 6 Reception,Guest,Agent
#> 3: 7 Reception,Guest
#> 4: 8 Reception,Guest,Agent
all_transcripts <- structure(list(ConvID = c(5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L), Name = c("Guest", "Guest",
"Agent", "Guest", "Agent", "Reception", "Guest", "Agent", "Guest",
"Guest", "Reception", "Reception", "Guest", "Guest", "Reception",
"Reception", "Guest", "Agent")), .Names = c("ConvID", "Name"), row.names = c(NA,
-18L), class = c("data.table", "data.frame"))