Search code examples
rlistduplicatesstringrtibble

Removing duplicates inside the nested lists of a tibble column in R


I have a tibble, in which 1 character column contains a string I want to parse. I want to store results of the parsing in a new list column, with no duplicates in each row.

The tibble is created by the following code:

my_tibble <- input_data_tibble |>
  group_by(tissue) |>
  summarize(id = str_flatten(id, ","))

The output I get looks like this - notice id type is chr:

my_tibble_bad <- tibble(
  tissue = c("Duodenum", "Ileum"),
  id = c("1, 2, 5, 5", "17, 17, 10, 10, 20, 20")
)
my_tibble_bad

The output I want looks like this

  • notice id is a list column, each list contains numbers, there are no duplicates):
my_tibble_good <- tibble(
  tissue = c("Duodenum", "Ileum"),
  id = list(c(1, 5), c(17, 10, 20))
  )
my_tibble_good

Does anyone know how I can get the result I want either by editing the original code, or by editing the output of the original code

I've tried a few options, and the best I can arrive at looks like this

test_string = "1, 1, 5, 5"
unique(as.numeric(gsub("\\D", "", unlist(strsplit(test_string, ",")))))

However, when I try to build this in to the code I get as far as:

my_tibble_bad |>
  mutate(x = strsplit(id, ",")) |>
  select(!id)

Once I add unlist, I get the error "x must be size 2 or 1, not 10.":

my_tibble_bad |> mutate(x = unlist(strsplit(id, ","))) |> select(!id)


Solution

  • Thank you @MrFlick

    So simple, I don't know how I didn't see it

    my_tibble <- input_data_tibble |>
      group_by(tissue) |>
      summarize(id = str_flatten(id, ","))
    

    Solves the problem by not creating the problem.