Search code examples
renvironment

Optimal way to modify a list in a function without copy in R


I am writing a package where I need to modify a big list through a series of functions. What are the possible ways to achieve this?

I am attaching my implementation but not sure if this is the optimum.

##' @export
test <- function(param = TRUE){
  x <- list("a"= data.frame(a1 = c(1,2), a2 = c(1,1)),
            "b"= data.frame(b1 = c(2,3), b2 = c(1,2)))
  message(paste("in test() function, references to x[[1]]:", inspect(x)[["children"]][[1]][["address"]]))
  message(paste("in test() function, references to x[[2]]:", inspect(x)[["children"]][[2]][["address"]]))
  for(name in names(x)) updateList(x, name)
  message(paste("in test() function, post update references to x[[1]]:", inspect(x)[["children"]][[1]][["address"]]))
  message(paste("in test() function, post update references to x[[2]]:", inspect(x)[["children"]][[2]][["address"]]))
  x
}

updateList <- function(x, name){
  message(paste("updateList() references to x[[1]]:", inspect(x)[["children"]][[1]][["address"]]))
  message(paste("updateList() references to x[[2]]:", inspect(x)[["children"]][[2]][["address"]]))
  newdf <- rbind(x[[name]], c(4,4))
  assign("temp", newdf, envir = parent.frame(n = 1))
  with(parent.frame(n = 1), x[[name]] <- temp)
  invisible(NULL)
}

In the Console, when I run test()

> library(pryr)
> test()
in test() function, references to x[[1]]: 0x55d66ce9dd98
in test() function, references to x[[2]]: 0x55d670954508
updateList() references to x[[1]]: 0x55d66ce9dd98
updateList() references to x[[2]]: 0x55d670954508
updateList() references to x[[1]]: 0x55d66fca1688
updateList() references to x[[2]]: 0x55d670954508
in test() function, post update references to x[[1]]: 0x55d66fca1688
in test() function, post update references to x[[2]]: 0x55d66ffb8208
$a
  a1 a2
1  1  1
2  2  1
3  4  4

$b
  b1 b2
1  2  1
2  3  2
3  4  4

Is there a way to make sure that R is not copying? How to know if it has created copies in-between?

As suggested by @len-greski, we can see the address of each element, We can see that at each iteration only one data-frame is copied and rests aren't.


Solution

  • Using environment is a good option.

    k <- lapply(1:100000, identity)
    names(k) <- as.character(1:length(k))
    f <- function(x, i){x[[i]] <- x[[i]]*2; x}
    system.time(for(i in 1:length(k)) f(k, as.character(i)))
    

    This takes almost 2.5 sec on my machine.

    e2 <- list2env(k, hash = FALSE)
    system.time(for(i in 1:length(k)) f(e2, as.character(i)))
    

    With environment it takes 0.3 seconds. 10 times faster!! With hash = TRUE, it is 100 times faster.