I can't figure out what works and what doesn't with the native pipe. Here are 2 examples, that I expect to work, but fail. I guess my problem is that I though it would work like the magrittr pipe.
Is there some variation of option 1 or 2 that achieves the same result as 3?
library(tidyverse)
# 1. What I think will work, but does not work:
mtcars |>
n_distinct(gear)
# 2. What I think will work, but does not work:
mtcars |>
n_distinct(_$gear)
# 3. Does work
mtcars |>
pull(gear) |>
n_distinct()
EDIT: The answer is that it most likely depends on whether the function expects a vector or a data frame.
Both answers do a good job of answering this question, but @r2evans answer is probably more helpful for others, so I will mark that as the solution.
The difference is that the function must be intended to operate on a data.frame
(or frame-like, such as tbl_df
), not on a vector. mutate
/summarize
and such all work on frames, whereas some of dplyr
's functions are meant to operate on vectors, whether inside a call to mutate
(...) or not. These non-frame functions must be given a vector.
TLDR: with
is a cheater function that supports the non-standard evaluation you're looking for: mtcars |> with(n_distinct(gear))
works (among many other expressions).
You can distinguish between what verbs can work like you have tried here (unwrapped, so to speak) by checking their args: if the first argument is something like .data=
or data=
or x=
(all expecting a data.frame
-like object), then it can be used immediately after |>
or %>%
. For instance, mutate
, summarize
, and reframe
all have something like this in their help pages:
Arguments:
.data: A data frame, data frame extension (e.g. a tibble), or a lazy
data frame (e.g. from dbplyr or dtplyr). See _Methods_,
below, for more details.
Even tidyr
functions (that work on the top-level like that) are similar, with
Usage:
pivot_wider(
data,
...,
id_cols = NULL,
<truncated>
Whereas with n_distinct
, all of its arguments:
Usage:
n_distinct(..., na.rm = FALSE)
Arguments:
...: Unnamed vectors. If multiple vectors are supplied, then they
should have the same length.
where its first (and optionally more) argument is a vector.
I infer the intended use of n_distinct
to return an integer, so we can easily adapt your first attempt to get what we need:
n_distinct(mtcars$gear)
# [1] 3
mtcars |> with(n_distinct(gear))
# [1] 3
mtcars |>
summarize(ngears = n_distinct(gear)) |>
pull(ngears)
# [1] 3
You asked about dplyr
-specific verbs and the pipe, but the notion that the counting function (n_distinct
) does not operate on its own is the same with a very similar package data.table
, where its verbs need to operate either on a vector or within its [
-scope (which is analogous in effect to needing to be within dplyr
's verbs):
data.table::uniqueN(mtcars$gear)
as.data.table(mtcars)[, uniqueN(gear)]
# blend of dplyr/data.table
as.data.table(mtcars)[, n_distinct(gear)]
The biggest reason this is the case is because dplyr
and data.table
both allow for non-standard evaluation (NSE) of column names. This is supported in a few base R functions (with
, withint
, subset
, and transform
come to mind, there are others), but it is prevalent in dplyr
and data.table
. This (NSE) is how you are able to do something like
mtcars |>
summarize(ngears = n_distinct(gear))
and not have to reference mtcars$gear
instead of gear
. (For a few reasons, mtcars$gear
inside of mutate
/summarize
/... is actively discouraged in dplyr
anyway).