I have a tibble of values:
raw = tibble(
labels = rep(rep(1:4,each=3),2),
group = rep(c("A","B"), each=12),
value = c(1,2,3,3,4,5,6,7,2,2,12,1,7,3,3,3,4,5,6,3,2,2,7,1))
I want to select for each group A and B seperatlty the common value in at least half of their for labels. The result may be
Res = tibble(group = c("A","B"),
value = c("1,2,3","2,3,7"))
It will be helpful if I can find a flexible function to do the same selection for at least 1/3 of labels.
Here is one option where we do a grouping by 'group', 'value', get the number of distinct 'labels', then do a group by 'group' and filter
the rowss where the 'n' is greater than or equal to the number of distinct 'labels' by 2 i.e. 50%, get the distinct
rows of 'group', 'value'
library(dplyr)
raw %>%
group_by(group, value) %>%
mutate(n = n_distinct(labels)) %>%
group_by(group) %>%
filter(n >= n_distinct(labels)/2) %>%
select(-n) %>%
ungroup %>%
distinct(group, value)
# A tibble: 6 x 2
# group value
# <chr> <dbl>
#1 A 1
#2 A 2
#3 A 3
#4 B 7
#5 B 3
#6 B 2