Search code examples
rtreeformula

Navigating a formula as a directed tree to find all occurence of a symbol


In R, a formula can be seen as a tree where the parent node is located on index [[1]] and the left and right terms in subsequent indices.

For example, with the formula f defined as:

f <- y ~ (a*x) +(b*x) + (c*z)

f[[1]] is ~ and f[[2]] is y. The last term f[[3]] will return the right-hand side of the equation. f[[3]][[2]][[2]][[2]][[3]] finally returns the first occurence of x.

Seen that way, the equation is a tree data structure similar to this: enter image description here

How could I navigate through the tree to get all the paths that lead to a certain symbols?

For example to get to x, there are two paths, {3, 2, 2, 2, 3} and {3, 2, 3, 2, 3}. We can check with f[[3]][[2]][[2]][[2]][[3]] that we indeed get the first occurrence of x in the tree.


Solution

  • You can use rrapply::rrapply() to recursively search for the element and return the indices:

    library(rrapply)
    
    find_ind <- \(f, value) {
      rrapply(
        f,
        condition = \(x) x == value,
        f = \(x, .xpos) .xpos,
        how = "flatten"
      )
    }
    
    find_ind(f, "x")
    
    [[1]]
    [1] 3 2 2 2 3
    
    [[2]]
    [1] 3 2 3 2 3
    

    Confirm that these are the indices of x in the formula:

    lapply(find_ind(f, "x"), \(i) f[[i]])
    
    [[1]]
    x
    
    [[2]]
    x