Search code examples
rlistperformancepurrr

How to find the path of an element in a nested list


How can I find the path of an element in a nested list without manually digging through a list in a View?

Here is an example that I can already deal with:

l1 <- list(x = list(a = "no_match", b = "test_noname", c ="test_noname"),
           y = list(a = "test_name"))

After looking for an off-the-shelf solution in other packages, I found this approach (strongly inspired by rlist::list.search):

list_search <- function(l, f) {
  ulist <- unlist(l, recursive = TRUE, use.names = TRUE)
  match <- f(ulist)
  ulist[match]
}
list_search(l1, f = \(x) x == "test_noname")
          x.b           x.c 
"test_noname" "test_noname" 

This works pretty well as it’s easy to understand that the name “x.b” here can be translated for access like this:

l1[["x"]][["b"]]
[1] "test_noname"
# Or
purrr::pluck(l1, "x", "b")
[1] "test_noname"

And I can get all elements on the same level, by leaving out the last level index:

l1[["x"]]
$a
[1] "no_match"

$b
[1] "test_noname"

$c
[1] "test_noname"

This is usually my goal, as I know the values/name of one of the elements I want to get to and other similar elements are placed on the same sub-level (or sub-sub-sub-sub-sub-sub-sub-level).

However, many JSON files on the internet are not quite meant for easy consumption and parse into much more complicated lists, that look more like this:

l2 <- list(x = list("no_match", list("test_noname1", "test_noname2")), y = list(a = "test_name"))
str(l2)
List of 2
 $ x:List of 2
  ..$ : chr "no_match"
  ..$ :List of 2
  .. ..$ : chr "test_noname1"
  .. ..$ : chr "test_noname2"
 $ y:List of 1
  ..$ a: chr "test_name"
list_search(l2, f = \(x) x == "test_noname1")
            x2 
"test_noname1" 

From the resulting names, I would guess that the element “x2” can be accessed like that:

l2[["x2"]]
NULL
# or maybe
l2[["x"]][[2]]
[[1]]
[1] "test_noname1"

[[2]]
[1] "test_noname2"

But to not also rake in “test_noname2” here, I actually need something like this:

l2[["x"]][[2]][[1]]
[1] "test_noname1"

Background

I often need to find the path of a known value when getting data through webscraping. The I might have a user named or URL that I know is somewhere in the data, but it's tedious to actually find it. Once one value is identified, it becomes easy to generalise to it's siblings, which are unknown so far. In the toy example, this would look like this:

l2[["x"]][[2]]
[[1]]
[1] "test_noname1"

[[2]]
[1] "test_noname2"

Only in reality, the lists I'm working with are nested much deeper.

So the issue is essentially unnamed elements in the list, that are not assigned names which are easy to generalise by unlist, or rapply for that matter. Ideally there would be an automated way to translate these into a pluck call.


Solution

  • If the question is how to get the path given the contents of a cell then using rrapply from the package of the same name

    library(rrapply)
    
    ix <- rrapply(l2, 
      condition = \(x) x == "test_noname1",
      f = \(x, .xpos) .xpos,
      how = "flatten")
    
    unlist(ix)
    ## 11 12 13 
    ##  1  2  1 
    
    l2[[unlist(ix)]]
    ## [1] "test_noname1"
    
    library(purrr)
    pluck(l2, !!!unlist(ix))
    ## [1] "test_noname1"
    

    Note

    Input from question

    l2 <- list(x = list("no_match", list("test_noname1", "test_noname2")),
               y = list(a = "test_name"))