I have seen many answers stating that ungroup()
must follow every group_by()
. However, I can't find the answer to one related question:
When performing sequential calculations, each with a different group_by() -- is it necessary to ungroup()
explicitly, or is ungroup()
implied in the group_by()
?
with <- iris %>%
group_by(Petal.Width) %>%
ungroup() %>%
group_by(Species) %>%
count() %>%
ungroup()
without <- iris %>%
group_by(Petal.Width) %>%
group_by(Species) %>%
count() %>%
ungroup()
identical(with, without)
# [1] TRUE
The answer seems to be that a mediating ungroup()
is unnecessary. Am I missing anything? Are there cases when there would be a difference?
Reading the help page is a good place to start. The ?group_by
help page describes the .add
argument:
.add
When
FALSE
, the default,group_by()
will override existing groups. To add to the existing groups, use.add = TRUE
.
So, as long as you don't override .add
's default value of FALSE
, you can skip the ungroup()
immediately before a group_by
. Any existing groups will be overridden.
Worth noting that since dplyr
version 1.1.0, you can use the .by
argument inside other functions like mutate
and summarize
which applies the grouping only for that command, and returns an ungrouped data frame. Generally, I much prefer this to group_by()
and ungroup()
.