I am writing a package where I need to modify a big list through a series of functions. What are the possible ways to achieve this?
I am attaching my implementation but not sure if this is the optimum.
##' @export
test <- function(param = TRUE){
x <- list("a"= data.frame(a1 = c(1,2), a2 = c(1,1)),
"b"= data.frame(b1 = c(2,3), b2 = c(1,2)))
message(paste("in test() function, references to x[[1]]:", inspect(x)[["children"]][[1]][["address"]]))
message(paste("in test() function, references to x[[2]]:", inspect(x)[["children"]][[2]][["address"]]))
for(name in names(x)) updateList(x, name)
message(paste("in test() function, post update references to x[[1]]:", inspect(x)[["children"]][[1]][["address"]]))
message(paste("in test() function, post update references to x[[2]]:", inspect(x)[["children"]][[2]][["address"]]))
x
}
updateList <- function(x, name){
message(paste("updateList() references to x[[1]]:", inspect(x)[["children"]][[1]][["address"]]))
message(paste("updateList() references to x[[2]]:", inspect(x)[["children"]][[2]][["address"]]))
newdf <- rbind(x[[name]], c(4,4))
assign("temp", newdf, envir = parent.frame(n = 1))
with(parent.frame(n = 1), x[[name]] <- temp)
invisible(NULL)
}
In the Console
, when I run test()
> library(pryr)
> test()
in test() function, references to x[[1]]: 0x55d66ce9dd98
in test() function, references to x[[2]]: 0x55d670954508
updateList() references to x[[1]]: 0x55d66ce9dd98
updateList() references to x[[2]]: 0x55d670954508
updateList() references to x[[1]]: 0x55d66fca1688
updateList() references to x[[2]]: 0x55d670954508
in test() function, post update references to x[[1]]: 0x55d66fca1688
in test() function, post update references to x[[2]]: 0x55d66ffb8208
$a
a1 a2
1 1 1
2 2 1
3 4 4
$b
b1 b2
1 2 1
2 3 2
3 4 4
Is there a way to make sure that R is not copying? How to know if it has created copies in-between?
As suggested by @len-greski, we can see the address of each element, We can see that at each iteration only one data-frame is copied and rests aren't.
Using environment
is a good option.
k <- lapply(1:100000, identity)
names(k) <- as.character(1:length(k))
f <- function(x, i){x[[i]] <- x[[i]]*2; x}
system.time(for(i in 1:length(k)) f(k, as.character(i)))
This takes almost 2.5 sec on my machine.
e2 <- list2env(k, hash = FALSE)
system.time(for(i in 1:length(k)) f(e2, as.character(i)))
With environment
it takes 0.3 seconds. 10 times faster!! With hash = TRUE
, it is 100 times faster.