Search code examples
rdata.tablemagrittrmget

R pipe, mget, and environments


I'm posting this in hopes someone could explain the behavior here. And perhaps this may save others some time in tracking down how to fix a similar error.

The answer is likely somewhere here in this vignette by Hadley Wickham and Lionel Henry. Yet it will take someone like me weeks of study to connect the dots.

I am running a number of queries from a remote database and then combining them into a single data.table. I add the "part_" prefix to the name of each individual query result and use ls() and mget() with data.table's rbindlist() to combined them.

This works:

results_all <- rbindlist(mget(ls(pattern = "part_", )))

I learned that approach, probably from list data.tables in memory and combine by row (rbind), and it is a helpful thing to know how to do for sure.

For readability, I often prefer using the magrittr pipe (or chaining with data.table) and especially so with projects like this because I use dplyr to query the database. Yet this code results in an error:

results_all <- ls(pattern = "part_", ) %>% 
 mget() %>%
 rbindlist()

The error reads Error: value for ‘part_a’ not found where part_a is the first object name in the character vector returned by ls().

Searching that error message, I came across the discussion in this data.table Github issue. Reading through that, I tried setting "inherits = TRUE" within mget() like so:

results_all <- ls(pattern = "part_", ) %>% 
 mget(inherits = TRUE) %>%
 rbindlist()

And that works. So the error is happening when piping the result of ls() to mget(). And given that nesting ls() within mget() works, my guess is that it is something to do with the pipe and "the enclosing frames of the environment".

In writing this up, I came across Unexpected error message while joining data.table with rbindlist() using mget(). From the discussion there I found out that this also works.

results_all <- ls(pattern = "part_", ) %>% 
 mget(envir = .GlobalEnv) %>%
 rbindlist()

Again, I am hoping someone can explain what is going on for folks looking to learn more about how environments work in R.

Edit: Adding reproducible example

Per the request for a reproducible answer, running the code above using these three data.tables (data.frames or tibbles will behave the same) should do it.

part_a <- data.table(col1 = 1:10, col2 = sample(letters, 10))

part_b <- data.table(col1 = 11:20, col2 = sample(letters, 10))
  
part_c <- data.table(col1 = 21:30, col2 = sample(letters, 10)) 

Solution

  • The rhs argument to a pipe operator (in your example, the expression mget()) is never evaluated as a function call by the interpreter. The pipe operator is an infix function that performs non-standard evaluation of its second argument (rhs). The pipe function composes and performs a new function call using the RHS expression as a sort of "template".

    The calling environment of this new function call is the function environment of %>%, not the calling environment of the lhs function or the global environment. .GlobalEnv and the calling environment of the lhs function happen to be the same environment in your example, and that environment is a parent to the function environment of %>%, which is why inherits = TRUE or setting the environment to .GlobalEnv works for you.