Search code examples
rif-statementdplyrbinary-datamutate

Create a binary column indicating the first success


Fish_ID Instance Success First_Success
1 0 0 0
1 1 0 0
1 2 1 1
1 3 0 0
1 4 1 0
1 5 0 0

I have a data frame with column "Fish_ID","Instance" and"Success" and I want to create a column "First_Success", containing a 1 only at first success and otherwise 0 for each fish_ID group. In the actual data I have more than one Fish_ID. Here is an example. I came up with this code:

mydata %>%
  group_by(fish_ID) %>%
  mutate(Fist_Success = ifelse(Success == 1, 1, 0))

But with this code, it counts for all success values that equals to 1 and I just want to it only takes for the first success as 1 for the "First_Success" column and the rest is 0 within the same fish_ID group. I would appreciate any suggestions. Thanks a lot!


Solution

  • You can use cumsum():

    library(dplyr)
    
    df %>%
      mutate(First_Success = +(Success & cumsum(Success) == 1),
             .by = Fish_ID)
    
    #   Fish_ID Instance Success First_Success
    # 1       1        0       0             0
    # 2       1        1       0             0
    # 3       1        2       1             1
    # 4       1        3       0             0
    # 5       1        4       1             0
    # 6       1        5       0             0
    

    It has a variant suggested by @langtang:

    df %>%
      mutate(First_Success = (cumsum(Success) == 1) * Success,
             .by = Fish_ID)
    

    Data

    df <- read.table(text =
    "Fish_ID    Instance    Success
    1   0   0
    1   1   0
    1   2   1
    1   3   0
    1   4   1
    1   5   0", header = TRUE)