Search code examples
rtidyverse

Mutate variable conditional on first unique occurance of another variable


I want to create a variable which identifies the first occurance of a variable in a column but I cannot seem to get the code to work.

The new varibale should only mark a nonNA index which is the first occurance of this variable and ideally function within a piped code chunk.

I have tried lag() but this function only looks at a single value, whereas I want to compare the index value with ALL preceeding values in the column.

I have tried rolling windows but I cannot seem to get this to work and I have tried a more simple solution, but cannot get this to work:

example:

df <- data.frame(index = c(NA,NA,1,NA,NA,1,2,NA,2,NA))
# Now add new column
df %>% mutate(Var = ifelse(!is.na(index & !index %in% index[1:nrow(.)],1,0))

Desired output:

|index|Var|
|----|----|
| NA | 0 |  
| NA | 0 |  
| 1  | 1 |    
| NA | 0 |
| NA | 0 |
| 1  | 0 |
| 2  | 1 |
| NA | 0 |
| 2  | 0 |
| NA | 0 |

Solution

  • An idea can be to create a flag (new) which captures the non-NAs (1 * (!is.na(index)... The 1* is to convert TRUE/FALSE to 1/0) and then replace all the duplicated values from the index to 0

    library(tidyverse)
    
    df %>% 
     mutate(new = 1 * (!is.na(index)), 
            new = replace(new, duplicated(index), 0))
    
       index new
    1     NA   0
    2     NA   0
    3      1   1
    4     NA   0
    5     NA   0
    6      1   0
    7      2   1
    8     NA   0
    9      2   0
    10    NA   0