Count unique strings that only occur in a single group based on all possible groups

I have the following df

a = data.frame(PA = c("A", "A", "A", "B", "B"), Family = c("aa", "ab", "ac", "aa", "ad"))

What I want to obtain is a count of unique 'Family' strings (aa, ab, ac, ad) in each PA (A or B) based on all possible PAs. For example, aa is a unique string for A and B, but since it occurs in both PAs I don't want it. On the other hand, ab and ac are unique for PA A and only occur in PA A: that's what I want.

Using dplyr I was doing something like:

df >%> group_by(PA) %>%
summarise(count_family = n_distinct(Family))

But this only returns unique terms inside each PA — and I want unique Families that occur inside unique PAs based on all possible PAs

Solution

Here's a tidyverse approach.

First remove all duplicated Family, then group_by(PA) and count.

library(tidyverse)

a %>% group_by(Family) %>% 
  filter(n() == 1) %>% 
  group_by(PA) %>%  
  summarize(count_family = n())

Output

# A tibble: 2 x 2
  PA    count_family
  <chr>        <int>
1 A                2
2 B                1

Output before `summarise()`

# A tibble: 3 x 2
# Groups:   Family [3]
  PA    Family
  <chr> <chr> 
1 A     ab    
2 A     ac    
3 B     ad

Count unique strings that only occur in a single group based on all possible groups

Output

Output before summarise()

Output before `summarise()`