Search code examples
rcountaggregateuniquedistinct

Group by and count unique values in several columns in R


I see a lot of examples of how to count values for one column. I can't find a solution for counting for several columns.

I have data like

city col1 col2 col3 col4 

I want to group by city and count unique values in col1, col2, col3...

aggregate(. ~ city, hh2, function(x) length(unique(x)))

I can count using aggregate, but it replaces city names with numbers and it's unclear how to revert it.


Solution

  • Here's an approach using dplyr::across, which is a handy way to calculate across multiple columns:

    my_data <- data.frame(
      city = c(rep("A", 3), rep("B", 3)),
      col1 = 1:6,
      col2 = 0,
      col3 = c(1:3, 4, 4, 4),
      col4 = 1:2
    )
    
    library(dplyr)
    my_data %>%
      group_by(city) %>%
      summarize(across(col1:col4, n_distinct))
    
    # A tibble: 2 x 5
      city   col1  col2  col3  col4
    * <chr> <int> <int> <int> <int>
    1 A         3     1     3     2
    2 B         3     1     1     2