Search code examples
rstringlistunique

Keep unique elements of each vector in a list of vectors


I have a dataframe with 1.6 million rows and one of the columns is a list of character vectors.

Each element of this list column looks as follows : c("A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61Q", "B05B").

I would like for it to be c("A61K","A61Q","B05B").

Meaning I just want to keep the unique values. This process should be repeated for each row.

I have tried this:

sapply(strsplit(try, "|", function(x) paste0(unique(x), collapse = ",")))

And solutions using for loops but it takes very long and R stops running.


Solution

  • You can handle it using unique() within lapply():

    # example df with list column
    dat <- data.frame(id = 1:2)
    dat$x <- list(
      c("A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61Q", "B05B"),
      c("A62K", "A61K", "A61K", "A58J", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61Q", "C97B")
    )
    
    dat 
    
      id                                                                      x
    1  1 A61K, A61K, A61K, A61K, A61K, A61K, A61K, A61K, A61K, A61K, A61Q, B05B
    2  2 A62K, A61K, A61K, A58J, A61K, A61K, A61K, A61K, A61K, A61K, A61Q, C97B
    
    # remove duplicates within list column by row
    dat$x <- lapply(dat$x, unique)
    
    dat
    
      id                            x
    1  1             A61K, A61Q, B05B
    2  2 A62K, A61K, A58J, A61Q, C97B