r package environment-variables devtools cran

R "no visible binding for global variable" note when creating variables in sub-routines and returning to environment

I am trying to submit a package to CRAN. My function was pretty long, several thousand lines long. I rewrote it and broke it up into a wrapper ("outside") function which calls a set of "inside" sub-functions (not exported) which create objects that I want to return to the wrapper function environment. I have tried either using the assign() function or list2env(), which does the same thing except it takes a list as argument and returns objects named as their named elements in the list. When I run R CMD check on my package, the "no visible binding for global variables" warning is triggered, because many variables are created in the sub-functions and returned to the environment from within these functions, and are used in the wrapper environment afterwards without an explicit instance of their creation in this environment.

I have seen questions raised about this before online. Some of them deal specifically with ggplot, dplyr, or with subsetting or data.frame issues. This is more general. Some online references mention using the utils::globalVariables function (https://github.com/r-lib/devtools/issues/1714) to first declare these variables that I will create later as global variables. The forums mention either putting these in a separate globals.R scrip, or in a function call at the beginning of my wrapper function. But this solution seems to be controversial as a "hack". Another solution (equally "hackish", but okay I suppose) is simply to initialize all these variables as NULL at the beginning of the code.

Another solution I have seen is to basically store all these objects as members of a list that is initialized in the wrapper function, and then to return all outputs of the sub-functions to append or modify the list items. In this way, the global objects I want to create are not separate objects individually, but are rather part of a list, so there is no problem. However, then I would need to singificantly rewrite my code to refer to every object as a list item (e.g., tmp$obj rather than just obj). On the other hand, this would be simpler in a way because all the objects are stored in a list that can be referred to and passed as a single unit, rather than having to keep track of them individually.

I would like to hear from people with experience about the various advantages/disadvantages or correctness of these approaches.

Returning objects to environment

outside_function <- function() {
    k <- letters[17:23]
    #inside_function creates objects m and z which did not exist before               
    inside_function()
    ls()
    print(m)
    print(z)
    inside_function()
    ls()
    #z and m should now be overwritten
    print(m)
    print(z)
}

inside_function <- function() {
    m <- matrix(runif(4), ncol=2)
    z <- letters[1:10]

    #assign to the wrapping environment 
    assign("m", m, envir=parent.frame())
    assign("z", z, envir=parent.frame())
    #an equivalent way:
    list2env(list(m=m, z=z), envir=parent.frame())  

}

Alternative way, keeping objects as a list

outside_function <- function() {
    k <- letters[17:23]
    #inside_function creates objects m and z which did not exist before               
    tmp <- inside_function()

    #refer to m and z only as items in tmp
    print(tmp$m)
    print(tmp$z)

    tmp <- inside_function()
    ls()
    #z and m should now be overwritten
    print(tmp$m)
    print(tmp$z)
}

inside_function <- function() {
    m <- matrix(runif(4), ncol=2)
    z <- letters[1:10]

    #return as list items
    list(m=m, z=z)
}

For the first one, I get the following notes:

outside_function: no visible binding for global variable 'm'
outside_function: no visible binding for global variable 'z'

Solution

SOLUTION USING ENVIRONMENTS

So I figured out how to do this. Yes, you can use the list approach, but it is somewhat artificial. Here is the proper way: define a named empty environment inside the wrapper function outside_function, to which all objects that you want to store (and return at the end) are written. This environment is then passed as a single argument (like a list) to the inside functions. Within inside_function, you can edit stored environment objects in real time, without having to explicitly return the objects in a list back to a list object. It is cleaner.

outside_function <- function() {
  
  myenv <- new.env(parent = emptyenv())
  #object k exists in local environment, but not myenv
  k <- LETTERS[17:23]
  #assign list of objects to 
  print(ls()) #two objects, k and myenv
  print(ls(myenv))

  print("first run")
  inside_function(env=myenv) 
  print("LS")
  print(as.list(myenv))
  print("second run")
  inside_function(env=myenv)
  print("LS")
  print(as.list(myenv))

  #inside here, have to refer to objects as list elements
  #the command print(m) searches through environments to find an object
  #if nothing exists locally, m will find myenv$m, but is misleading
  #try(print(m))  
  #now create a local object m that is different
  m <- "blah"
  print(m) #gives 'blah'
  print(myenv$m)
  
  #return at end as a list
  invisible(as.list(myenv))
 
}  
inside_function <- function(env) {
  #create/overwrite objects in env
  
  env$m <- matrix(stats::runif(4), ncol=2)
  #these are created in real time within inside_function without having
  #to return env (notice NULL is a returned value)
  print(env$m)
  #overwite
  env$m <- matrix(stats::runif(4), ncol=2)
  print(env$m)
  env$d <- 5
  print(env$d)
  env$d <- env$d + runif(1)
  env$z <- letters[sample(1:20, size=6)]
  invisible(NULL)
}

tmp <- outside_function()
print(tmp) #contains all the objects as a list