Search code examples
rdplyrtidyversesummarize

R - getting count of maximum-sized sub-group when summarising at prior group_by level


What I am trying to do is along the lines of the following:

library(tidyverse)

starwars %>% 
  filter(!is.na(gender)) %>% 
  group_by(gender) %>% 
  summarise(total_count = n(), max_species_count_per_gender = max(count(species)))

Basically, in addition to trying to get the total count per group separated by gender after one group_by and reporting that in a summary column, I am also trying to extract the highest subgroup population count of that higher-level group for a given trait (in this case, species). Obviously, the above does not work, returning the error message,

Caused by error in `UseMethod()`:
! no applicable method for 'count' applied to an object of class "character"

So, if I am trying to end up with something along the lines of

# A tibble: 2 × 3
  gender    total_count     max_species_count_per_gender
  <chr>           <int>                            <int>
1 feminine           17                   some_smaller_x
2 masculine          66                   some_smaller_y

Is this something I can approach as part of a summarise action, or will I need to do something else? Thank you for your help.


Solution

  • You could summarize twice. The use of .by... is an alternative to group_by & ungroup, either would work.

    library(tidyverse)
    
    starwars %>%
      filter(!is.na(gender)) %>%
      summarize(
        sub_count = n(),
        .by = c(species, gender)
      ) %>%
      summarize(
        total_count = sum(sub_count),
        max_species_count = max(sub_count),
        .by = gender
      )
    #> # A tibble: 2 × 3
    #>   gender    total_count max_species_count
    #>   <chr>           <int>             <int>
    #> 1 masculine          66                26
    #> 2 feminine           17                 9
    

    Created on 2024-02-29 with reprex v2.0.2