Search code examples
rdplyr

Why does dplyr::tally() remove last grouping variable from output?


When running dplyr::tally() on a grouped data.frame, then the last grouping variable will be removed from the "Groups". I'm sure that this is intended behaviour but I don't understand why and where this is documented. Any ideas?

library(dplyr) # 1.0.10

# single grouping variable returns ungrouped output
starwars %>%
  group_by(species) %>%
  tally() %>%
  groups()

#> list()

# three grouping variables return output grouped by first and second group
starwars %>%
  group_by(species, eye_color, skin_color) %>%
  tally() %>%
  groups()

#> [[1]]
#> species
#>
#> [[2]]
#> eye_color

Solution

  • As @Konrad Rudolph mentioned tally internally calls summarise whose default behavior is to drop last grouping value.

    From ?summarise

    .groups [Experimental] Grouping structure of the result.

    "drop_last": dropping the last level of grouping. This was the only supported option before version 1.0.0.

    "drop": All levels of grouping are dropped.

    "keep": Same grouping structure as .data.

    "rowwise": Each row is its own group.

    Additionally,

    When .groups is not specified, it is chosen based on the number of rows of the results:

    If all the results have 1 row, you get "drop_last

    Since in your case you are getting 1 row per group drop_last behavior is used.