Search code examples
rstrsplit

Find all unique values in column separated by comma


I have multiple observations of one species with different observers / groups of observers and want to create a list of all unique observers. My data look like this:

data <- read.table(text="species observer
1 A,B
1 A,B
1 B,E
1 B,E
1 D,E,A,C,C
1 F"               , header = TRUE, stringsAsFactors = FALSE)

My output should return a list of all unique observers - so:

A,B,C,E,F

I tried to substring the data in column C using the following command but that only returns the unique combinations of observers.

all_observers <- unique(strsplit(as.character(data$observer), ","))

all_observers
[[1]]
[1] "A" "B"

[[2]]
[1] "B" "E"

[[3]]
[1] "D" "E" "A" "C" "C"

[[4]]
[1] "F"

Solution

  • You're almost there, you just need to unlist before you do the unique:

    all_observers <- unique(unlist(strsplit(as.character(data$observer), ",")))