Search code examples
r

Are doubly-linked structures allowed?


I am writing a package with an S4 class that looks like a simple tree structure: the class, say S4node, has a slot nodes which is a list of child S4node (via a common virtual ancestor, say S4base). The class also has a slot parent of type S4base that links back up to the parent of the child nodes. That looks more-or-less like this (with barebones coding for clarity):

# Create the root node, slot parent is set to NA, slot id is set to 567
> root <- new_node(567) 

# Create some child nodes, slot parent is set to root, slot nodes has new node added
> root <- add_node(root, 5671)
> root <- add_node(root, 5672)

This works ok and all slots are set correctly. Traversing down the hierarchy works, as well as traversing up the hierarchy. But here is the problem: when I want to traverse up the hierarchy the nodes slot is set to NULL, which becomes obvious when traveling down some other branch. Like this:

> node5671 <- find_node(root, 5671)

# Now traverse 5671 -> 567 -> 5672
> node5672 <- node_travel(node5671, "../5672")
Error: path not found

On the console this becomes clearer:

> length(root@nodes)
[1] 2
> node5671@parent@id
[1] 567
> nodes5671@parent@nodes
NULL

# Best one yet: take the root node, get a child, get the parent (= root), get the nodes = NULL!!!
> root@nodes[[1]]@parent@nodes
NULL

It appears to me that the list with nodes is emptied automatically, but only when implicitly accessed: I can construct the list in the first place, only when referencing the list through a @parent lookup is the list emptied. Is this to avoid endless recursion? Is it a quirk of gc()? Is there a work-around for this?


Solution

  • I can only assume that add_node modifies the nodes slot of a copy of its argument and returns the modified copy. The copy gets a new node and the original is preserved. It was not too hard to devise an example using tracemem ...

    > setClass("S4base")
    > setClass("S4root",
    +          contains = "S4base",
    +          slots = c(id = "integer", parent = "NULL", nodes = "list"),
    +          prototype = list(id = 0L, parent = NULL, nodes = list()))
    > setClass("S4node",
    +          contains = "S4base",
    +          slots = c(id = "integer", parent = "S4base", nodes = "list"),
    +          prototype = list(id = 1L, parent = new("S4root"), nodes = list()))
    > 
    > root <- new("S4root", id = 0L); tracemem(root)
    [1] "<0x1370506d8>"
    > node <- new("S4node", id = 1L, parent = root)
    > root@nodes <- list(node)
    tracemem[0x1370506d8 -> 0x1370f5aa8]: 
    > identical(root, root@nodes[[1L]]@parent)
    [1] FALSE
    > 
    > str(root)
    Formal class 'S4root' [package ".GlobalEnv"] with 3 slots
      ..@ id    : int 0
      ..@ parent: NULL
      ..@ nodes :List of 1
      .. ..$ :Formal class 'S4node' [package ".GlobalEnv"] with 3 slots
      .. .. .. ..@ id    : int 1
      .. .. .. ..@ parent:Formal class 'S4root' [package ".GlobalEnv"] with 3 slots
      .. .. .. .. .. ..@ id    : int 0
      .. .. .. .. .. ..@ parent: NULL
      .. .. .. .. .. ..@ nodes : list()
      .. .. .. ..@ nodes : list()
    >
    

    To get around copy-on-modify in R, you should probably use environments (possibly S4 classes extending environment):

    > root <- new.env(parent = emptyenv()); try(tracemem(root))
    Error in tracemem(root) : 
      'tracemem' is not useful for promise and environment objects
    > node <- new.env(parent = emptyenv())
    > 
    > root$id <- 0L
    > root$nodes <- list(node)
    > 
    > node$id <- 1L
    > node$nodes <- list()
    > node$parent <- root
    > 
    > identical(root, root$nodes[[1L]]$parent)
    [1] TRUE
    >