Navigating a formula as a directed tree to find all occurence of a symbol

In R, a formula can be seen as a tree where the parent node is located on index [[1]] and the left and right terms in subsequent indices.

For example, with the formula f defined as:

f <- y ~ (a*x) +(b*x) + (c*z)

f[[1]] is ~ and f[[2]] is y. The last term f[[3]] will return the right-hand side of the equation. f[[3]][[2]][[2]][[2]][[3]] finally returns the first occurence of x.

Seen that way, the equation is a tree data structure similar to this:

How could I navigate through the tree to get all the paths that lead to a certain symbols?

For example to get to x, there are two paths, {3, 2, 2, 2, 3} and {3, 2, 3, 2, 3}. We can check with f[[3]][[2]][[2]][[2]][[3]] that we indeed get the first occurrence of x in the tree.

Solution

You can use rrapply::rrapply() to recursively search for the element and return the indices:

library(rrapply)

find_ind <- \(f, value) {
  rrapply(
    f,
    condition = \(x) x == value,
    f = \(x, .xpos) .xpos,
    how = "flatten"
  )
}

find_ind(f, "x")

[[1]]
[1] 3 2 2 2 3

[[2]]
[1] 3 2 3 2 3

Confirm that these are the indices of x in the formula:

lapply(find_ind(f, "x"), \(i) f[[i]])

[[1]]
x

[[2]]
x