Search code examples
rdataframedplyrtidyverse

Find the first occurrence of each value in a column by groups


I have a dataframe as follows:

set.seed(456)
df <- data.frame(ID = rep(c('a', 'b'), each = 5), x = sample(1:3, 10, TRUE))

   ID x
1   a 1
2   a 1
3   a 3
4   a 2
5   a 1
6   b 3
7   b 1
8   b 2
9   b 3
10  b 2

I want to create a new column y that displays the values of x if those values occur at the first time, otherwise NA. The operation should be done by ID. The expected output is

   ID x  y
1   a 1  1
2   a 1 NA
3   a 3  3
4   a 2  2
5   a 1 NA
6   b 3  3
7   b 1  1
8   b 2  2
9   b 3 NA
10  b 2 NA

Solution

  • A slightly more adventurous option could be:

    df %>%
     group_by(ID) %>%
     mutate(x_unique = x*(!duplicated(x))^NA)
    
       ID        x x_unique
       <chr> <int>    <dbl>
     1 a         1        1
     2 a         1       NA
     3 a         3        3
     4 a         2        2
     5 a         1       NA
     6 b         3        3
     7 b         1        1
     8 b         2        2
     9 b         3       NA
    10 b         2       NA