I have the following DF
x = data.frame('grp' = c(1,1,1,2,2,2),'a' = c(1,2,1,1,2,1), 'b'= c(6,5,6,6,2,6), 'c' = c(0.1,0.2,0.4,-1, 0.9,0.7))
grp a b c
1 1 1 6 0.1
2 1 2 5 0.2
3 1 1 6 0.4
4 2 1 6 -1.0
5 2 2 2 0.9
6 2 1 6 0.7
I want to count distinct levels of (a,b)
for each group where c >= 0.1
I have tried using dplyr
for this using group_by
& summarise
but not getting the desired result
x %>% group_by(grp) %>% summarise(count = n_distinct(c(a,b)[c >= 0.1]))
For the above case I would expect the following result
grp count
<dbl> <int>
1 1 2
2 2 2
However using the above query I am getting the following result
grp count
<dbl> <int>
1 1 4
2 2 3
Logically the above output seems to be solving for all unique values of a concat list of (a,b)
but not what I require
Any pointers, really appreciate any help
We can paste
a
and b
columns and count distinct values in each group.
library(dplyr)
x %>%
mutate(col = paste(a, b, sep = "_")) %>%
group_by(grp) %>%
summarise(count = n_distinct(col[c >= 0.1]))
# grp count
# <dbl> <int>
#1 1 2
#2 2 2