I have linked documents (document trees) saved in a list (list
)
Some of the documents trees have items that are not complete (marked with seach=1
). Some trees might have multiple incomplete trees that are marked with search=1
.
I would like to extend/complete these incomplete trees using lookup list that contain document trees (list_lookup
), there is always only one matching tree in list and list_lookup
. The level
of the matching document tree should be adjusted to the document tree in list
.
Sample data and desired output:
library(tidyverse)
# initial df1, aaa is incomplete (it is in fact linked to other documents, but this information is stored in the lookup list)
df1 <- tibble(id_from=c(NA_character_,"111","222","333","444","444","bbb"),
id_to=c("111","222","333","444","aaa","bbb","ccc"),
level=c(0,1,2,3,4,4,5),
search=c(0,0,0,0,1,0,0))
df1
#> # A tibble: 7 × 4
#> id_from id_to level search
#> <chr> <chr> <dbl> <dbl>
#> 1 <NA> 111 0 0
#> 2 111 222 1 0
#> 3 222 333 2 0
#> 4 333 444 3 0
#> 5 444 aaa 4 1
#> 6 444 bbb 4 0
#> 7 bbb ccc 5 0
# lookup dfs, df2 contains the further document links of aaa
df2 <- tibble(id_from=c(NA,"aaa","x","x"),
id_to=c("aaa","x","x1","x2"),
level=c(0,1,2,2))
df3 <- tibble(id_from=c(NA,"thank"),
id_to=c("thank","you"),
level=c(0,1))
#list with df
list <- list(df1)
#list with lookups
list_lookup <- list(df2,df3)
list_lookup
#> [[1]]
#> # A tibble: 4 × 3
#> id_from id_to level
#> <chr> <chr> <dbl>
#> 1 <NA> aaa 0
#> 2 aaa x 1
#> 3 x x1 2
#> 4 x x2 2
#>
#> [[2]]
#> # A tibble: 2 × 3
#> id_from id_to level
#> <chr> <chr> <dbl>
#> 1 <NA> thank 0
#> 2 thank you 1
#what I need; an updated list of dfs where information from the lookup list are included
df1_wanted <- tibble(id_from=c(NA_character_,"111","222","333","444","444","aaa","bbb","x","x"),
id_to=c("111","222","333","444","aaa","bbb","x","ccc","x1","x1"),
level=c(0,1,2,3,4,4,5,5,6,6))
list(df1_wanted)
#> [[1]]
#> # A tibble: 10 × 3
#> id_from id_to level
#> <chr> <chr> <dbl>
#> 1 <NA> 111 0
#> 2 111 222 1
#> 3 222 333 2
#> 4 333 444 3
#> 5 444 aaa 4
#> 6 444 bbb 4
#> 7 aaa x 5 <- added from df2, level adjusted
#> 8 bbb ccc 5
#> 9 x x1 6 <- added from df2, level adjusted
#> 10 x x1 6 <- added from df2, level adjusted
Created on 2023-04-01 with reprex v2.0.2
My approach:
I thought about using purrr::map
to map a function to every item of list
, however, I am not sure how this function should look like.
In this solution:
get_tree()
, that takes a single id
and lookup table and returns the full tree from the table for that id
.complete_tree()
, that takes a dataframe and a list of lookup tables, iterates over get_tree()
for every id_to
where search == 1
and for each lookup table, adjusts level
, and binds the results to the initial dataframe.complete_tree()
for every element of list
.library(dplyr)
library(purrr)
get_tree <- function(id, lookup) {
branch <- filter(lookup, id_from == id)
if (nrow(branch) == 0) return()
bind_rows(
branch,
map(branch$id_to, \(x) get_tree(x, lookup))
)
}
complete_trees <- function(data, lookups) {
branches <- pmap(
filter(data, search == 1),
\(id_to, level, ...) {
bind_rows(map(
lookups,
\(lookup) get_tree(id_to, lookup)
)) %>%
mutate(level = level + .env$level)
}
)
bind_rows(data, branches) %>%
select(!search) %>%
arrange(level, id_from)
}
map(list, \(x) complete_trees(x, lookups = list_lookup))
Result:
[[1]]
# A tibble: 10 × 3
id_from id_to level
<chr> <chr> <dbl>
1 <NA> 111 0
2 111 222 1
3 222 333 2
4 333 444 3
5 444 aaa 4
6 444 bbb 4
7 aaa x 5
8 bbb ccc 5
9 x x1 6
10 x x2 6