Search code examples
rmatching

How to match one row from one column to the next 5-10 rows in two other columns in R?


I have a data frame which looks like this:

df1 <- structure(list(day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20), observ1 = c(1, 0, 0, 0, 0, 1, 
 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), observ2 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1), 
observ3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)), 
 class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))

Previously I got a TRUE value if observ1 equals 1 and after 5 to 10 days, observ2 also equals 1.

Now, I need to add a 3rd condition that if observ1 equals 1, and after 5-10 days, observ2 equals 1 AND also observ3 equals 1 within the same 5-10 days, then retrun TRUE.

So, the new 'check' column should look like this:

df1 <- structure(list(day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), 
observ1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), 
observ2 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1), 
observ3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0), 
check = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'TRUE', 0, 0, 0, 0, 0, 0)), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))

Solution

  • Hopefully this helps, thanks for asking another question, this is generally considered the way to go when you need to add on to your original question btw. Im not sure this is correct, can you please give me guidance on whether or not this is what you are after ?

    df1$check <- with(
      df1, 
      vapply(
        seq_along(observ1),
        function(i){
          # If we are less than five days in: 
          if(i - 5 <= 0){
            # Return NA: logical scalar => env
            NA
          # Otherwise:
          }else{
            # Ensure no negative indices by setting a lower bound of 1: 
            # idx_lower_bound => integer scalar
            idx_lower_bound <- max(
              i-10, 
              1
            )
            # Compute the index: idx => integer vector
            idx <- seq(
              idx_lower_bound,
              i+5,
              by = 1
            )
            # Test if all conditions are true: 
            # check => logical scalar
            check <- all(
              # The current value of observ2 == 1 ? logical scalar
              observ1[i] == 1,
              # Any observ2 values in the range == 1 ? logical scalar
              any(observ2[idx] == 1),
              # Any observ3 values in the range == 1 ? logical scalar
              any(observ3[idx] == 1)
            )
            # Replace false with NA: logical vector => env
            ifelse(
              check, 
              check, 
              NA
            )
          }
        },
        logical(1)
      )
    )
    

    Data:

    df1 <- structure(
      list(
        day = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), 
        observ1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), 
        observ2 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1), 
        observ3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
        ),
      class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L)
    )