Search code examples
rfunctiondata.tablesubsetdata.cube

subsetting data.cube inside custom function


I am trying to make a function of my own to subset a data.cube in R, and format the result automatically for some predefined plots I aim to build.

This is my function.

require(data.table)
require(data.cube)

secciona <- function(cubo  = NULL, 
                     fecha_valor = list(), 
                     loc_valor = list(), 
                     prod_valor = list(), 
                     drop = FALSE){

    cubo[fecha_valor, loc_valor, prod_valor, drop = drop]

    ## The line above will really be an asignment of type y <- format(cubo[...drop])
    ## Rest of code which will end up plotting the subset of the function
}

The thing is I keep on getting the error: Error in eval(expr, envir, enclos) : object 'fecha_valor' not found

What is most strange for me, is that on the console everything works fine, but not when defined inside the subsetting function of mine.

In console:

> dc[list(as.Date("2013/01/01"))]
> dc[list(as.Date("2013/01/01")),]
> dc[list(as.Date("2013/01/01")),,]
> dc[list(as.Date("2013/01/01")),list(),list()]

all give as result:

<data.cube>
fact:
  5627 rows x 2 dimensions x 1 measures (0.32 MB)
dimensions:
  localizacion : 4 entities x 3 levels (0.01 MB)
  producto : 153994 entities x 3 levels (21.29 MB)
total size: 21.61 MB

But whenever I try

secciona(dc)
secciona(dc, fecha_valor = list(as.Date("2013/01/01")))
secciona(dc, fecha_valor = list())

I always get the error above mentioned.

Any ideas why this is happening? should I proceed in else way for my approach of editing the subset for plotting?


Solution

  • This is the standard issue that R users will face when dealing with non-standard evaluation. This is a consequence of Computing on the language R language feature.
    [.data.cube function expects to be used in interactive way, that extends the flexibility of the arguments passed to it, but gives some restrictions. In that aspect it is similar to [.data.table when passing expressions from wrapper function to [ subset operator. I've added dummy example to make it reproducible.

    I see you are already using data.cube-oop branch, so just to clarify for other readers. data.cube-oop branch is 92 commits ahead of master branch, to install use the following.

    install.packages("data.cube", repos = paste0("https://", c(
        "jangorecki.gitlab.io/data.cube",
        "Rdatatable.github.io/data.table",
        "cran.rstudio.com"
    )))
    

    library(data.cube)
    set.seed(1)
    ar = array(rnorm(8,10,5), rep(2,3), 
               dimnames = list(color = c("green","red"), 
                               year = c("2014","2015"), 
                               country = c("IN","UK"))) # sorted
    dc = as.data.cube(ar)
    
    f = function(color=list(), year=list(), country=list(), drop=FALSE){
        expr = substitute(
            dc[color=.color, year=.year, country=.country, drop=.drop],
            list(.color=color, .year=year, .country=country, .drop=drop)
        )
        eval(expr)
    }
    f(year=list(c("2014","2015")), country="UK")
    #<data.cube>
    #fact:
    #  4 rows x 3 dimensions x 1 measures (0.00 MB)
    #dimensions:
    #  color : 2 entities x 1 levels (0.00 MB)
    #  year : 2 entities x 1 levels (0.00 MB)
    #  country : 1 entities x 1 levels (0.00 MB)
    #total size: 0.01 MB
    

    You can track the expression just by putting print(expr) before/instead eval(expr).

    Read more about non-standard evaluation:
    - R Language Definition: Computing on the language
    - Advanced R: Non-standard evaluation
    - manual of substitute function
    And some related SO questions:
    - Passing on non-standard evaluation arguments to the subset function
    - In R, why is [ better than subset?