library(dplyr)
my_tibble <- tibble(
parentId = character(),
origId = character(),
id = character(),
value = double()
)
original_tibble <- my_tibble %>%
add_row(parentId = "2024Z123", origId = NA, id = "2024Z123", value = 2) %>%
add_row(parentId = "2024Z123", origId = "2024Z123", id = "2024A652", value = 6) %>%
add_row(parentId = "2024A652", origId = "2024Z123", id = "2024X156", value = 5) %>%
add_row(parentId = "2020G001", origId = NA, id = "2020G001", value = 1) %>%
add_row(parentId = "2020G001", origId = "2020G001", id = "2024X156", value = 2) %>%
add_row(parentId = "2021B456", origId = NA, id = "2021B456", value = 3)
final_tibble <- my_tibble %>%
add_row(parentId = "2024A652", origId = "2024Z123", id = "2024X156", value = 5) %>%
add_row(parentId = "2020G001", origId = "2020G001", id = "2024Z156", value = 2) %>%
add_row(parentId = "2021B456", origId = NA, id = "2021B456", value = 3)
The original_tibble contains 3 chains. The way the chain evolves is the first entry has parentId = id, and origId = NA and a value. The id entries are characters and the values are ints. Then for each update to the value a new row with the parentId = most recent update row id, origId = the id of the first entry, and id = unique entry for this new row, and the new value entry. The chains also aren't necessarily ordered within the rows in the way I have built the table above, so can't rely on that.
The final_tibble is the result I'm looking for. Trying to do this in dplyr but any solution is fine. What I want is to keep the most recent row entry and eliminate the previous entries from the chain.
First completing a grouping variable grp by combining parentId and origId, then looking for an id that is unique, which is automatically the most recent one.
library(dplyr)
original_tibble %>%
group_by(grp = if_else(is.na(origId), parentId, origId)) %>%
filter(!id %in% parentId | n() == 1) %>%
ungroup() %>%
select(-grp)
# A tibble: 3 × 4
parentId origId id value
<chr> <chr> <chr> <dbl>
1 2024A652 2024Z123 2024X156 5
2 2020G001 2020G001 2024X156 2
3 2021B456 NA 2021B456 3
Note, in your case you can replace if_else(...)
with coalesce(origId, parentId)