Search code examples
rrevolution-rmicrosoft-r

environment not behaving as expected after using transformEnvir in RevoScaleR function


I have a function where I'm reading an xdf file using rxXdfToDataFrame and using a variable in my expression for rowSelection. If I don't pass transformEnvir=environment(), the variable is not found. My problem is that after calling the function with transformEnvir, I can't seem to reliably access .GlobalEnv. If I hardcode a number into rowSelection I don't need to use transformEnvir and everything works correctly. I tried setting the environment, but I'm not sure I was even doing it correctly.

The following code reproduces my problem:

envirtest = function()
{
   require(data.table)
   df = data.frame(x=1:10)
   selectnum = 5
   rxDataFrameToXdf(df, "testxdf.xdf")
   testdf = rxXdfToDataFrame("testxdf.xdf",rowSelection=(x==selectnum),transformEnvir=environment())
   testdt = setDT(testdf)
}

The error that occurs:

Error in envirtest() : could not find function "setDT"

However, if instead of setDT(), data.table::setDT() is used, then the function executes.

edit: I forgot to mention that I had tried it without transformEnvir set and everything worked properly. Also, tables() was changed to setDT() to avoid possible confusion.


Solution

  • Here is a solution to your problem, together with a partial explanation:

    • At the completion of the transformation, the transformation environment gets cleared.
    • This means it is safer to create an environment and then adding any objects into this environment before starting the rx-function.

    Concretely:

    env <- new.env()
    env$selectnum = 5
    

    Set up your function like this:

    envirtest = function()
    {
      require(data.table)
      df = data.frame(x=1:10)
      env <- new.env()
      env$selectnum = 5
    
      rxDataFrameToXdf(df, "testxdf.xdf", overwrite=TRUE)
      testdf <- rxXdfToDataFrame("testxdf.xdf",
                                 rowSelection=(x==selectnum),
                                 transformEnvir=env
      )
      setDT(testdf)
    }
    

    Now try it:

    x <- envirtest()
    
    Rows Read: 10, Total Rows Processed: 10, Total Chunk Time: 0.006 seconds 
    Rows Processed: 1
    Time to read data file: 0.00 secs.
    Time to convert to data frame: less than .001 secs.
    
    str(x)
    
    Classes ‘data.table’ and 'data.frame':  1 obs. of  1 variable:
     $ x: int 5
     - attr(*, ".internal.selfref")=<externalptr>