I have a tibble, in which 1 character column contains a string I want to parse. I want to store results of the parsing in a new list column, with no duplicates in each row.
The tibble is created by the following code:
my_tibble <- input_data_tibble |>
group_by(tissue) |>
summarize(id = str_flatten(id, ","))
The output I get looks like this - notice id type is chr:
my_tibble_bad <- tibble(
tissue = c("Duodenum", "Ileum"),
id = c("1, 2, 5, 5", "17, 17, 10, 10, 20, 20")
)
my_tibble_bad
The output I want looks like this
my_tibble_good <- tibble(
tissue = c("Duodenum", "Ileum"),
id = list(c(1, 5), c(17, 10, 20))
)
my_tibble_good
Does anyone know how I can get the result I want either by editing the original code, or by editing the output of the original code
I've tried a few options, and the best I can arrive at looks like this
test_string = "1, 1, 5, 5"
unique(as.numeric(gsub("\\D", "", unlist(strsplit(test_string, ",")))))
However, when I try to build this in to the code I get as far as:
my_tibble_bad |>
mutate(x = strsplit(id, ",")) |>
select(!id)
Once I add unlist, I get the error "x
must be size 2 or 1, not 10.":
my_tibble_bad |> mutate(x = unlist(strsplit(id, ","))) |> select(!id)
Thank you @MrFlick
So simple, I don't know how I didn't see it
my_tibble <- input_data_tibble |>
group_by(tissue) |>
summarize(id = str_flatten(id, ","))
Solves the problem by not creating the problem.