R: Counting the number of groups having each value of a variable?

Sample data:

tibble::tibble(id = c("101", "101", "101", "102", "102", "103"), 
    color = c("Blue", "Blue", "Red", "Red", "Green", "Green"))

#> # A tibble: 6 × 2
#>   id    color 
#>   <chr> <chr> 
#> 1 101   Blue  
#> 2 101   Blue  
#> 3 101   Red   
#> 4 102   Red   
#> 5 102   Green 
#> 6 103   Green

^{Created on 2024-09-05 with reprex v2.1.1}

I want to count the number of id's which have a certain value of color, for every value of color.

Sample Output:

tibble::tibble(color = c("Blue", "Red", "Green"), number_of_ids_having_color = c("1", 
    "2", "2"))

#> # A tibble: 3 × 2
#>   color number_of_ids_having_color
#>   <chr>       <chr>                           
#> 1 Blue        1                               
#> 2 Red         2                               
#> 3 Green       2

^{Created on 2024-09-05 with reprex v2.1.1}

This is sort of similar to this question, the differences being that (a) my variable to condition on is categorical and (b) I want to make this calculation for every single value of said variable.

As a little bonus question (lower priority), let's say I also have another variable, most_important_color and I'd like to also count the number of id's having each color as their most_important_color:

tibble::tibble(id = c("101", "101", "101", "102", "102", "103"), 
    color = c("Blue", "Blue", "Red", "Red", "Green", "Green"), 
    most_important_color = c("F", "F", "T", "T", "F", "T"))

#> # A tibble: 6 × 3
#>   id    color most_important_color
#>   <chr> <chr>       <chr>               
#> 1 101   Blue        F                   
#> 2 101   Blue        F                   
#> 3 101   Red         T                   
#> 4 102   Red         T                   
#> 5 102   Green       F                   
#> 6 103   Green       T

^{Created on 2024-09-05 with reprex v2.1.1}

The final end product, incorporating the solution to my first (and more important question) would look like this:

tibble::tibble(color_group = c("Blue", "Red", "Green"), number_of_ids_having_color = c("1", 
    "2", "2"), number_of_ids_having_color_as_most_important = c("0", 
    "2", "1"))

#> # A tibble: 3 × 3
#>   color_group number_of_ids_having_color number_of_ids_having_color_as_most_im…¹
#>   <chr>       <chr>                      <chr>                                  
#> 1 Blue        1                          0                                      
#> 2 Red         2                          2                                      
#> 3 Green       2                          1                                      
#> # ℹ abbreviated name: ¹number_of_ids_having_color_as_most_important

^{Created on 2024-09-05 with reprex v2.1.1}

It's fairly straightforward to do this separately and then join the results of that operation to that of above, but I'm interested if there's a way to do all of this in a single pipe without any need to create intermediate objects and join them.

Tidyverse solutions strongly preferred since I share this code with people who are mostly comfortable with tidyverse.

Thank you!

Solution

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df_colors <- tibble::tibble(
  id = c("101", "101", "101", "102", "102", "103"), 
  color = c("Blue", "Blue", "Red", "Red", "Green", "Green"),
  most_important_color = c(F, F, T, T, F, T)
)

df_colors |>
  group_by(color) %>% 
  summarise(
    ids = n_distinct(id),
    most_important = n_distinct(if_else(most_important_color, id, NA_character_), na.rm = T),
    records = n()
  )
#> # A tibble: 3 × 4
#>   color   ids most_important records
#>   <chr> <int>          <int>   <int>
#> 1 Blue      1              0       2
#> 2 Green     2              1       2
#> 3 Red       2              2       2

^{Created on 2024-09-05 with reprex v2.1.1}