Search code examples
rdplyrscopedata-wranglingmagrittr

Behavior of %>% when piping values to functions containing pipes


The below examples demonstrate that passing an object to deparse() and substitute() produces different output depending on whether the object is passed to the function with %>% and whether the functions are nested in another function. Can someone explain why the examples I provide produce different results? The more general question concerns what%>% does when it passes objects to functions containing %>%.

Examples

Examples 1 and 2 show that passing an object iris to substitute and deparse outputs "iris" regardless of whether we use a %>% pipe chain or not. It also suggests that iris %>% substitute() is not effectively assigning iris to . then passing that to substitute() since otherwise we'd expect an output of ..

library(magrittr)

# 1. No pipes
deparse(substitute(iris)) 
#> [1] "iris"

# 2. Pipe chain
iris %>% substitute() %>% deparse()
#> [1] "iris"

Let's define two custom functions: one passes its input to deparse() and substitute() with %>% while the other does not.

# Function: doesn't contain pipes
nopipe <- function(data) {
  deparse(substitute(data))
}

# Function: contains pipes
pipe <- function(data) {
  data %>% substitute() %>% deparse()
}

Let's pass iris to those custom functions. Examples 5 and 6 pass iris to the functions using %>% but examples 3 and 4 don't. The custom functions for examples 4 and 6 contain %>% but the custom functions for examples 3 and 5 don't.

# 3. No pipes, contains custom function
nopipe(iris)
#> [1] "iris"

# 4. Pipes only inside custom function
pipe(iris)
#> [1] "data"

# 5. Pipes only outside custom function
iris %>% nopipe()
#> [1] "."

# 6. Pipes inside and outside custom function
iris %>% pipe()
#> [1] "data"

The fact that 3 outputs the same as 1 and 2 suggests the custom function effectively is replacing data with the iris object. That 4 and 6 output data regardless of whether iris is passed with %>% suggest the use of %>% inside the custom function is somehow "overriding" that replacement regardless of the value the function receives. Finally, the fact that 5 returns . suggests that when the custom function isn't ignoring the value for data due to containing %>%, piping iris to the function is assigning iris to .. Overall, this is quite confusing as 2 demonstrates %>% isn't assigning to . while it's unclear why %>% is causing "overriding." While the %>% documentation discusses issues with evaluation order in nested functions, this doesn't appear to be such an issue.


Solution

  • When you call the %>% function, it lazily grabs the left and right and sides of the expression as promises. It then assigned the left hand side promise to a variable named . which is passed to the function. The main take away is that substitute only unrolls one level of promises.

    There's no easy way to create a promise directly in R. It's basically an unevaluated piece of code. The substitue function can access that unevaluated expression from a promise as long as it hasn't been "collapsed" by evaluation. But substitute can only go one level up. Consider for example

    a <- function(a) substitute(a)
    b <- function(b) a(b)
    
    a(one)
    # one
    b(one)
    # b
    

    When we call a(one), the function takes a and goes "up" a step to see we passed in one. But when we call b(one). The substitute function only goes up one step a -> b rather than all the way to a -> b -> one.

    Let's pretend that a function promise exists that we can call to make a promise. The pipe function essentially takes the promise of the left-hand side expression and assigns it to .. So iris %>% substitute() is like

    . <- promise(iris)
    substitue(.)
    

    So it can unwrap the . value one level to get iris. When when you call iris %>% nopipe(). It's like calling

    . <- promise(iris)
    nopipe(.)
    

    and since nopipe is calling substitute(data), it can only go up one step from data -> . and can't go two steps to data -> . -> iris.

    And the problem again with pipe is that you have two levels of indirection. When you call pipe(iris) That's like calling

    data <- promise(iris)
    . <- promise(data)
    substitute(.)
    

    You can see it does use the alias . if you look at

    foo <- function(...) sys.call()
    a %>% foo(a=.+2)
    # foo(., a = . + 2)
    

    Here we use sys.call to see how foo was invoked. You can see that that magrittr added in a . as the first parameter since we had an existing parameter with a function of . rather than . alone. That existing parameter was preserved as well.