I have used %>%
, the magrittr
pipe, as given in its documentation by providing a function without empty parentheses to the RHS in this answer and got a comment that the recommended convention is to supply empty parentheses to the RHS.
library(magrittr)
1:3 %>% sum # The documentation calls this: Basic use
1:3 %>% sum() # It's also possible to supply empty parentheses
1:3 |> sum() # And It's similar to |> the base pipe
An advantage might be that the syntax is like for |>
, the base pipe.
But on the other hand, %>%
could also be used like a function and there functions are typically provided without parentheses.
`%>%`(1:3, sum)
sapply(list(1:3), sum)
`%=>%` <- sapply
list(1:3) %=>% sum
do.call(sum, list(1:3))
`%<%` <- do.call
sum %<% list(1:3)
In this case, it looks like it's constant to use it without parentheses.
On the other hand, when using the placeholder, parentheses need to be provided.
"axc" %>% sub("x", "b", .)
What are the disadvantages when providing a function without parentheses to the pipe and what are the good technical reasons to provide it with empty parentheses?
But on the other hand
%>%
could also be used like a function and there functions are typically provided without parentheses.
No, this is confusing things: there is no single way in which functions are “typically provided”, it entirely depends on the usage.
You use the examples of sapply
and do.call
. Both are higher-order functions, which means that they expect functions as arguments.1 Since they expect functions as arguments, we can pass a name which refers to a function. But instead of a name we can also pass an arbitrary expression which evaluates to a function.
… In fact, don’t get hung up on the fact that you are passing a name in your example, it’s a red herring. Here’s an example where we pass the result of an expression (which returns a function) instead:
make_adder = function (y) {
function (x) x + y
}
sapply(1 : 3, make_adder(2))
But this is potentially a distraction, because %>%
does not expect a function object as its second argument. Instead, it expects a function call expression.
In my example above, sapply
is a regular function, which evalutes its arguments using standard evaluation. Both its arguments, 1 : 3
, as well as make_adder(2)
, are evaluated and the results are passed to sapply
as arguments.2
%>%
is not a regular function: it suppresses standard evaluation of the second argument. Instead, it keeps the expression in its unevaluated form and manipulates it. The way it does that is fairly complex but in the simplest case it injects its first argument into the expression and subsequently evaluates it. Here’s some pseudocode to illustrate this:
`%>%` = function (lhs, rhs) {
# Get the unevaluated expression passed as `rhs`
rhs_expr = substitute(rhs)
new_rhs_expr = insert_first_argument_into(rhs_expr, lhs)
eval.parent(new_rhs_expr)
}
This works for any valid rhs
expression: sum()
, head(3)
, etc. %>%
transforms these into, respectively, sum(lhs)
, sum(lhs, 3)
, etc., and evaluates the resulting expression.
So far, this is perfectly consistent. However, the author of %>%
chose to allow an additional, entirely distinct usage: instead of passing a function call expression as rhs
, you can also pass a simple name. In that case, %>%
does something completely different. Instead of constructing a new call expression that injects lhs
, and evaluating that, it directly calls rhs(lhs)
:
`%>%` = function (lhs, rhs) {
rhs_expr = substitute(rhs)
if (is.name(rhs_expr)) {
rhs(lhs)
} else {
# (code from above.)
}
}
In other words, %>%
accepts two fundamentally different types of arguments as rhs
, and does different things for them.
This isn’t in itself a problem yet. It becomes a problem if we pass a function factory as the rhs
. That’s a higher-order function which itself returns a function. make_adder
from above is such a function factory.
So: what does 1 : 3 %>% make_adder(2)
do? …
Error in make_adder(., 2) : unused argument (2)
Oh, right! make_adder(2)
is a function call expression, so the first definition of %>%
applies: transform the expression and evaluate it. So it attempts to evaluate make_adder(2, 1 : 3)
, and that fails, because make_adder
only expects one argument.
Luckily for our sanity we can use make_adder
with %>%
. This doesn’t even require additional rules or documentation. With a bit of thinking it follows directly from the first definition above: we need to add another layer of function call, because we want %>%
to call the function that is returned by make_adder
. The following works:
1 : 3 %>% make_adder(2)()
# 3 4 5
%>%
interpolated the lhs
such that new_rhs
became make_adder(2)(1 : 3)
.
We could make this a bit more readable by assigning the return value of make_adder(2)
to a name:
add_2 = make_adder(2)
1 : 3 %>% make_adder(2)() # (1)
# \___________/
# v
# /‾‾‾\
1 : 3 %>% add_2() # (2)
We directly replaced a subexpression by a newly introduced name here. This is an extremely basic computer science concept, but it is so powerful that it has its own name: referential transparency. It’s a concept which makes reasoning about programs easier, because we know that we can always assign arbitrary sub-expression to a name and use that name in its place in a piece of code: (1) and (2) are identical.
But, actually, referential transparency requires that we can also do the replacement in reverse, i.e. replace the name by the value that it refers to. Sure enough, this works, and we get our original expression back:
1 : 3 %>% add_2() # (1)
# \___/
# v
# /‾‾‾‾‾‾‾‾‾‾‾\
1 : 3 %>% make_adder(2)() # (2)
(1) and (2) are still identical.
But unfortunately it does not always work:
1 : 3 %>% add_2 # (1)
# \___/
# v
# /‾‾‾‾‾‾‾‾‾‾‾\
1 : 3 %>% make_adder(2) # (2)
(1) works, but (2) fails, even though we merely substituted add_2
with its definition. %>%
does not preserve referential transparency.3
And that is why not using parentheses on the RHS is inconsistent, and why it is widely discouraged (e.g. by the tidyverse style guide). And it is also (as far as I understand) why the R core developers decided that |>
always requires a function call expression as its RHS, and you cannot omit the parentheses.
1 We have a special word for this concept because accepting functions as arguments used to be very uncommon in mainstream programming languages.
2 This is a simplification. The truth is more complicated, but irrelevant here. If you are curious, see R Language Definition: Argument evaluation.
3 Violating referential transparency in R is quite easy because R gives us a lot of control over how we want to evaluate expressions. And often this can be quite handy. But when not used with care it can cause confusing code and subtle bugs, and it is recommended to weigh violations of referential transparency carefully against the benefits.