I have this following dataframe
table=data.frame(ID=c("A", "A", "A", "A", "A", "A", "A", "B", "B", "C", "C", "C", "D", "D"),
obj=c("fu", "sp", "sp", "sp", "sp", "sp", "sp", "fu", "sp", "fu", "sp", "sp", "fu", "sp"),
gens=c(NA, NA, "AZJE", NA, "JAUE", NA, "AZJE", NA, NA, NA, "AUEJ", NA, NA, "EUF"))
> table
ID obj gens
1 A fu <NA>
2 A sp <NA>
3 A sp AZJE
4 A sp <NA>
5 A sp JAUE
6 A sp <NA>
7 A sp AZJE
8 B fu <NA>
9 B sp <NA>
10 C fu <NA>
11 C sp AUEJ
12 C sp <NA>
13 D fu <NA>
14 D sp EUF
I would like to add a variable that count for each ID, the number of unique value in gens.
The desired output is as below
> output
ID obj gens res
1 A fu <NA> 2
2 A sp <NA> 2
3 A sp AZJE 2
4 A sp <NA> 2
5 A sp JAUE 2
6 A sp <NA> 2
7 A sp AZJE 2
8 B fu <NA> 0
9 B sp <NA> 0
10 C fu <NA> 1
11 C sp AUEJ 1
12 C sp <NA> 1
13 D fu <NA> 1
14 D sp EUF 1
I could calculate the number of value for each ID but i don't know how to count the unique value. Please see what I have tried below
table_test = table %>%
group_by(ID) %>%
mutate(res = sum(obj=="sp" & !is.na(gens )))
> table_test
# A tibble: 14 × 4
# Groups: ID [4]
ID obj gens res
<chr> <chr> <chr> <int>
1 A fu NA 3
2 A sp NA 3
3 A sp AZJE 3
4 A sp NA 3
5 A sp JAUE 3
6 A sp NA 3
7 A sp AZJE 3
8 B fu NA 0
9 B sp NA 0
10 C fu NA 1
11 C sp AUEJ 1
12 C sp NA 1
13 D fu NA 1
14 D sp EUF 1
table |>
group_by(ID) |>
mutate(res = length(unique(setdiff(gens,NA))))