Search code examples
rperformancecomparisonr-environment

Return list vs environment from an R function


What advantage/disadvantage is there for using one over other in the following two cases? Case-I is returning its output as an environment and Case-II is returning its output as a list.

Case I:

function(x) {
  ret <- new.env()
  ret$x <- x
  ret$y <- x^2
  return(ret)
}

Case II:

function(x) {
  ret <- list()
  ret$x <- x
  ret$y <- x^2
  return(ret)
}

Solution

  • Although similars, there're differences in return a list and a enviroment. From Advanced R:

    Generally, an environment is similar to a list, with four important exceptions:

    • Every name in an environment is unique.

    • The names in an environment are not ordered (i.e., it doesn’t make sense to ask what the first element of an environment is).

    • An environment has a parent.

    • Environments have reference semantics.

    More technically, an environment is made up of two components, the frame, which contains the name-object bindings (and behaves much like a named list), and the parent environment. Unfortunately “frame” is used inconsistently in R. For example, parent.frame() doesn’t give you the parent frame of an environment. Instead, it gives you the calling environment. This is discussed in more detail in calling environments.

    From the help:

    help(new.env)
    

    Environments consist of a frame, or collection of named objects, and a pointer to an enclosing environment. The most common example is the frame of variables local to a function call; its enclosure is the environment where the function was defined (unless changed subsequently). The enclosing environment is distinguished from the parent frame: the latter (returned by parent.frame) refers to the environment of the caller of a function. Since confusion is so easy, it is best never to use ‘parent’ in connection with an environment (despite the presence of the function parent.env).

    from the function's documentation:

    e1 <- new.env(parent = baseenv())  # this one has enclosure package:base.
    e2 <- new.env(parent = e1)
    assign("a", 3, envir = e1)
    ls(e1)
    #[1] "a"
    

    However ls will gives the environments created:

    ls()
    #[1] "e1" "e2"
    

    And you can access your enviroment objects just like a list:

    e1$a
    #[1] 3
    

    Playing with your functions:

    f1 <- function(x) {
       ret <- new.env()
       ret$x <- x
       ret$y <- x^2
       return(ret)
    }
    
    res <- f1(2)
    res
    #<environment: 0x0000021d55a8a3e8>
    
    res$y
    #[1] 4
    
    f2 <- function(x) {
       ret <- list()
       ret$x <- x
       ret$y <- x^2
       return(ret)
    
    res2 <- f(2)
    res2
    #$x
    #[1] 2
    
    #$y
    #[1] 4
    
    res2$y
    #[1] 4
    

    Their performance is quite similar, according to microbenchmarking:

    microbenchmark::microbenchmark(
       function(x) {
          ret <- new.env()
          ret$x <- x
          ret$y <- x^2
          return(ret)
       },
       function(x) {
          ret <- list()
          ret$x <- x
          ret$y <- x^2
          return(ret)
       },
       times = 500L
    )
    
    #Unit: nanoseconds
    #                                                                                 #expr
    # function(x) {     ret <- new.env()     ret$x <- x     ret$y <- x^2     #return(ret) }
    #    function(x) {     ret <- list()     ret$x <- x     ret$y <- x^2     #return(ret) }
    # min lq   mean median  uq  max neval
    #   0  1 31.802      1 100  801   500
    #   0  1 37.802      1 100 2902   500
    

    and they return objects with same sizes:

    object.size(res)
    #464 bytes
    
    object.size(res2)
    #464 bytes
    

    and you can always generate a list from an enviroment (list2env) and the inverse too (as.list):

    L <- list(a = 1, b = 2:4, p = pi, ff = gl(3, 4, labels = LETTERS[1:3]))
    e <- list2env(L)
    e$ff
    # [1] A A A A B B B B C C C C
    #Levels: A B C
    
    as.list(e)
    #$ff
    # [1] A A A A B B B B C C C C
    #Levels: A B C
    #
    #$p
    #[1] 3.141593
    #
    #$b
    #[1] 2 3 4
    #
    #$a
    #[1] 1