I am writing a package with an S4 class that looks like a simple tree structure: the class, say S4node
, has a slot nodes
which is a list of child S4node
(via a common virtual ancestor, say S4base
). The class also has a slot parent
of type S4base
that links back up to the parent of the child nodes. That looks more-or-less like this (with barebones coding for clarity):
# Create the root node, slot parent is set to NA, slot id is set to 567
> root <- new_node(567)
# Create some child nodes, slot parent is set to root, slot nodes has new node added
> root <- add_node(root, 5671)
> root <- add_node(root, 5672)
This works ok and all slots are set correctly. Traversing down the hierarchy works, as well as traversing up the hierarchy. But here is the problem: when I want to traverse up the hierarchy the nodes
slot is set to NULL
, which becomes obvious when traveling down some other branch. Like this:
> node5671 <- find_node(root, 5671)
# Now traverse 5671 -> 567 -> 5672
> node5672 <- node_travel(node5671, "../5672")
Error: path not found
On the console this becomes clearer:
> length(root@nodes)
[1] 2
> node5671@parent@id
[1] 567
> nodes5671@parent@nodes
NULL
# Best one yet: take the root node, get a child, get the parent (= root), get the nodes = NULL!!!
> root@nodes[[1]]@parent@nodes
NULL
It appears to me that the list with nodes is emptied automatically, but only when implicitly accessed: I can construct the list in the first place, only when referencing the list through a @parent
lookup is the list emptied. Is this to avoid endless recursion? Is it a quirk of gc()
? Is there a work-around for this?
I can only assume that add_node
modifies the nodes
slot of a copy of its argument and returns the modified copy. The copy gets a new node and the original is preserved. It was not too hard to devise an example using tracemem
...
> setClass("S4base")
> setClass("S4root",
+ contains = "S4base",
+ slots = c(id = "integer", parent = "NULL", nodes = "list"),
+ prototype = list(id = 0L, parent = NULL, nodes = list()))
> setClass("S4node",
+ contains = "S4base",
+ slots = c(id = "integer", parent = "S4base", nodes = "list"),
+ prototype = list(id = 1L, parent = new("S4root"), nodes = list()))
>
> root <- new("S4root", id = 0L); tracemem(root)
[1] "<0x1370506d8>"
> node <- new("S4node", id = 1L, parent = root)
> root@nodes <- list(node)
tracemem[0x1370506d8 -> 0x1370f5aa8]:
> identical(root, root@nodes[[1L]]@parent)
[1] FALSE
>
> str(root)
Formal class 'S4root' [package ".GlobalEnv"] with 3 slots
..@ id : int 0
..@ parent: NULL
..@ nodes :List of 1
.. ..$ :Formal class 'S4node' [package ".GlobalEnv"] with 3 slots
.. .. .. ..@ id : int 1
.. .. .. ..@ parent:Formal class 'S4root' [package ".GlobalEnv"] with 3 slots
.. .. .. .. .. ..@ id : int 0
.. .. .. .. .. ..@ parent: NULL
.. .. .. .. .. ..@ nodes : list()
.. .. .. ..@ nodes : list()
>
To get around copy-on-modify in R, you should probably use environments (possibly S4 classes extending environment
):
> root <- new.env(parent = emptyenv()); try(tracemem(root))
Error in tracemem(root) :
'tracemem' is not useful for promise and environment objects
> node <- new.env(parent = emptyenv())
>
> root$id <- 0L
> root$nodes <- list(node)
>
> node$id <- 1L
> node$nodes <- list()
> node$parent <- root
>
> identical(root, root$nodes[[1L]]$parent)
[1] TRUE
>