Search code examples
rsumtidyversemutate

mutate a variable that count all unique value of a variable


I have this following dataframe

table=data.frame(ID=c("A", "A", "A", "A", "A", "A", "A", "B", "B", "C", "C", "C", "D", "D"),
                 obj=c("fu", "sp", "sp", "sp", "sp", "sp", "sp", "fu", "sp", "fu", "sp", "sp", "fu", "sp"),
                 gens=c(NA, NA, "AZJE", NA, "JAUE", NA, "AZJE", NA, NA, NA, "AUEJ", NA, NA, "EUF"))

> table
   ID obj gens
1   A  fu <NA>
2   A  sp <NA>
3   A  sp AZJE
4   A  sp <NA>
5   A  sp JAUE
6   A  sp <NA>
7   A  sp AZJE
8   B  fu <NA>
9   B  sp <NA>
10  C  fu <NA>
11  C  sp AUEJ
12  C  sp <NA>
13  D  fu <NA>
14  D  sp  EUF

I would like to add a variable that count for each ID, the number of unique value in gens.

The desired output is as below

> output
   ID obj gens res
1   A  fu <NA>   2
2   A  sp <NA>   2
3   A  sp AZJE   2
4   A  sp <NA>   2
5   A  sp JAUE   2
6   A  sp <NA>   2
7   A  sp AZJE   2
8   B  fu <NA>   0
9   B  sp <NA>   0
10  C  fu <NA>   1
11  C  sp AUEJ   1
12  C  sp <NA>   1
13  D  fu <NA>   1
14  D  sp  EUF   1

I could calculate the number of value for each ID but i don't know how to count the unique value. Please see what I have tried below

table_test = table %>% 
  group_by(ID)  %>%
  mutate(res = sum(obj=="sp" & !is.na(gens )))
 
> table_test
# A tibble: 14 × 4
# Groups:   ID [4]
   ID    obj   gens    res
   <chr> <chr> <chr> <int>
 1 A     fu    NA        3
 2 A     sp    NA        3
 3 A     sp    AZJE      3
 4 A     sp    NA        3
 5 A     sp    JAUE      3
 6 A     sp    NA        3
 7 A     sp    AZJE      3
 8 B     fu    NA        0
 9 B     sp    NA        0
10 C     fu    NA        1
11 C     sp    AUEJ      1
12 C     sp    NA        1
13 D     fu    NA        1
14 D     sp    EUF       1


Solution

  • table |> 
    group_by(ID) |>
     mutate(res = length(unique(setdiff(gens,NA))))