I see a lot of examples of how to count values for one column. I can't find a solution for counting for several columns.
I have data like
city col1 col2 col3 col4
I want to group by city and count unique values in col1, col2, col3...
aggregate(. ~ city, hh2, function(x) length(unique(x)))
I can count using aggregate, but it replaces city names with numbers and it's unclear how to revert it.
Here's an approach using dplyr::across
, which is a handy way to calculate across multiple columns:
my_data <- data.frame(
city = c(rep("A", 3), rep("B", 3)),
col1 = 1:6,
col2 = 0,
col3 = c(1:3, 4, 4, 4),
col4 = 1:2
)
library(dplyr)
my_data %>%
group_by(city) %>%
summarize(across(col1:col4, n_distinct))
# A tibble: 2 x 5
city col1 col2 col3 col4
* <chr> <int> <int> <int> <int>
1 A 3 1 3 2
2 B 3 1 1 2