R group_by and count distinct values in dataframe column with condition, using mutate
This is similar to the question R group by | count distinct values grouping by another column but slightly different in that i want to mutate together with a condition. to count distinct values in each group, but also present in the vector c
id = c( 1, 1, 2, 2, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6 )
val = c(100,100,200,300,400,500,500,500,600,600,200,200,300,500)
df <- data.frame(id=id, val=val)
c= c(200,500)
I know the following code is wrong, but it gives and idea of what i am trying to do. So, I want to create a fourth column, by grouping df by id, where the count of distinct values is given when it is present in c
x <- df %>% group_by(id) %>%
mutate(distinctValues = n_distinct(val %in% c ))
If we consider the id = 6, The distict values present in it are 200,300 and 500. But 300 is not present in c. So the count is = 2.
so the output column will be
c(0,0,1,1,0,1,1,1,1,1,2,2,2,2)
Since c
is unique, you can approach it from the other way - count the number of c
values that show up in val
.
df %>%
group_by(id) %>%
mutate(distinctValues = sum(c %in% val))
# # A tibble: 14 x 3
# # Groups: id [6]
# id val distinctValues
# <dbl> <dbl> <int>
# 1 1 100 0
# 2 1 100 0
# 3 2 200 1
# 4 2 300 1
# 5 3 400 0
# 6 4 500 1
# 7 4 500 1
# 8 5 500 1
# 9 5 600 1
# 10 5 600 1
# 11 6 200 2
# 12 6 200 2
# 13 6 300 2
# 14 6 500 2
You could also use distinctValues = sum(unique(val) %in% c)
if that seems clearer - it might be a tad less efficient, but not enough to matter unless your data is massive.