Search code examples
rdplyr

ungroup() before group_by()?


I have seen many answers stating that ungroup() must follow every group_by(). However, I can't find the answer to one related question:

When performing sequential calculations, each with a different group_by() -- is it necessary to ungroup() explicitly, or is ungroup() implied in the group_by()?

with <- iris %>%
  group_by(Petal.Width) %>%
  ungroup() %>%
  group_by(Species) %>%
  count() %>%
  ungroup()

without <- iris %>%
  group_by(Petal.Width) %>%
  group_by(Species) %>%
  count() %>%
  ungroup()

identical(with, without)
# [1] TRUE

The answer seems to be that a mediating ungroup() is unnecessary. Am I missing anything? Are there cases when there would be a difference?


Solution

  • Reading the help page is a good place to start. The ?group_by help page describes the .add argument:

    .add

    When FALSE, the default, group_by() will override existing groups. To add to the existing groups, use .add = TRUE.

    So, as long as you don't override .add's default value of FALSE, you can skip the ungroup() immediately before a group_by. Any existing groups will be overridden.

    Worth noting that since dplyr version 1.1.0, you can use the .by argument inside other functions like mutate and summarize which applies the grouping only for that command, and returns an ungrouped data frame. Generally, I much prefer this to group_by() and ungroup().