The themes dataset from Rebrickable includes for each theme its ID and its parent's ID (columns have been renamed here), which may recurse (ID may have grandparents, great-grandparents, etc.) Here is an example which follows the parent chain City -> Advent -> Seasonal:
themes <- data.frame(theme_id = c(206, 207, 208),
theme_name = c("Seasonal", "Advent", "City"),
parent_theme_id = c(NA, 206, 207))
Here is my base R code to follow the parent IDs up until reaching NA, meaning no parent:
for (i in 1:nrow(themes)) {
orig_id <- themes[i,]$theme_id
cur_id <- orig_id
repeat {
par_id <- themes[themes$theme_id == cur_id,]$parent_theme_id
if (is.na(par_id)) break
cur_id <- par_id
}
themes[themes$theme_id == orig_id,]$ancestor_theme_id = cur_id
}
Is there a nicer tidyverse or base R way of creating a new column that consists of the greatest ancestor IDs? I can't think of a nice way to recursively follow ID "pointers" without operating one row at a time and having a loop somewhere.
As @r2evans explains in the comments, there aren't tidyverse functions for the recursive lookup itself, but you can make your code a bit more succinct and tidyverse-y by defining a recursive function and calling it from mutate(map_dbl())
:
library(dplyr)
library(purrr)
find_ancestor <- function(id) {
dat <- cur_data()
parent <- filter(dat, theme_id == id)$parent_theme_id
if (is.na(parent)) id else find_ancestor(parent)
}
themes %>%
mutate(ancestor_theme_id = map_dbl(theme_id, find_ancestor))
theme_id theme_name parent_theme_id ancestor_theme_id
1 206 Seasonal NA 206
2 207 Advent 206 206
3 208 City 207 206