Search code examples
rdplyrdata.table

Flag observations before and after a specific value in another column


Say I have a df:

df <- data.frame(flag = c(rep(0, 20)),
                 include = c(rep(1, 20)))
df[c(4,8,16), ]$flag <- 1
df

   flag include
1     0       1
2     0       1
3     0       1
4     1       1
5     0       1
6     0       1
7     0       1
8     1       1
9     0       1
10    0       1
11    0       1
12    0       1
13    0       1
14    0       1
15    0       1
16    1       1
17    0       1
18    0       1
19    0       1
20    0       1

What I wish to do is change the include flag to 0 if the row is within +/- two rows of a row where flag == 1. The result would look like:

   flag include
1     0       1
2     0       0
3     0       0
4     1       1
5     0       0
6     0       0
7     0       0
8     1       1
9     0       0
10    0       0
11    0       1
12    0       1
13    0       1
14    0       0
15    0       0
16    1       1
17    0       0
18    0       0
19    0       1
20    0       1

I've thought of some 'innovative' (read: inefficient and over complicated) ways to do it but was thinking there must be a simple way I'm overlooking.

Would be nice if the answer was such that I could generalize this to +/- n rows, since I have a lot more data and would be looking to potentially search within +/- 10 rows...


Solution

  • Another option with data.table:

    library(data.table)
    n = 2
    # find the row number where flag is one
    flag_one = which(df$flag == 1)
    
    # find the index where include needs to be updated
    idx = setdiff(outer(flag_one, -n:n, "+"), flag_one)
    
    # update include in place
    setDT(df)[idx[idx >= 1 & idx <= nrow(df)], include := 0][]
    
    # or as @Frank commented the last step with base R would be
    # df$include[idx[idx >= 1 & idx <= nrow(df)]] = 0
    
    #    flag include
    # 1:    0       1
    # 2:    0       0
    # 3:    0       0
    # 4:    1       1
    # 5:    0       0
    # 6:    0       0
    # 7:    0       0
    # 8:    1       1
    # 9:    0       0
    #10:    0       0
    #11:    0       1
    #12:    0       1
    #13:    0       1
    #14:    0       0
    #15:    0       0
    #16:    1       1
    #17:    0       0
    #18:    0       0
    #19:    0       1
    #20:    0       1
    

    Put in a function:

    update_n <- function(df, n) {
        flag_one = which(df$flag == 1)
        idx = setdiff(outer(flag_one, -n:n, "+"), flag_one)
        df$include[idx[idx >= 1 & idx <= nrow(df)]] = 0
        df
    }