I have the following df
a = data.frame(PA = c("A", "A", "A", "B", "B"), Family = c("aa", "ab", "ac", "aa", "ad"))
What I want to obtain is a count of unique 'Family' strings (aa, ab, ac, ad) in each PA (A or B) based on all possible PAs. For example, aa is a unique string for A and B, but since it occurs in both PAs I don't want it. On the other hand, ab and ac are unique for PA A and only occur in PA A: that's what I want.
Using dplyr
I was doing something like:
df >%> group_by(PA) %>%
summarise(count_family = n_distinct(Family))
But this only returns unique terms inside each PA — and I want unique Families that occur inside unique PAs based on all possible PAs
Here's a tidyverse
approach.
First remove all duplicated Family
, then group_by(PA)
and count.
library(tidyverse)
a %>% group_by(Family) %>%
filter(n() == 1) %>%
group_by(PA) %>%
summarize(count_family = n())
# A tibble: 2 x 2
PA count_family
<chr> <int>
1 A 2
2 B 1
summarise()
# A tibble: 3 x 2
# Groups: Family [3]
PA Family
<chr> <chr>
1 A ab
2 A ac
3 B ad