In R, a formula can be seen as a tree where the parent node is located on index [[1]]
and the left and right terms in subsequent indices.
For example, with the formula f
defined as:
f <- y ~ (a*x) +(b*x) + (c*z)
f[[1]]
is ~
and f[[2]]
is y
. The last term f[[3]]
will return the right-hand side of the equation. f[[3]][[2]][[2]][[2]][[3]]
finally returns the first occurence of x
.
Seen that way, the equation is a tree data structure similar to this:
How could I navigate through the tree to get all the paths that lead to a certain symbols?
For example to get to x
, there are two paths, {3, 2, 2, 2, 3} and {3, 2, 3, 2, 3}. We can check with f[[3]][[2]][[2]][[2]][[3]]
that we indeed get the first occurrence of x
in the tree.
You can use rrapply::rrapply()
to recursively search for the element and return the indices:
library(rrapply)
find_ind <- \(f, value) {
rrapply(
f,
condition = \(x) x == value,
f = \(x, .xpos) .xpos,
how = "flatten"
)
}
find_ind(f, "x")
[[1]]
[1] 3 2 2 2 3
[[2]]
[1] 3 2 3 2 3
Confirm that these are the indices of x
in the formula:
lapply(find_ind(f, "x"), \(i) f[[i]])
[[1]]
x
[[2]]
x