Search code examples
rdataframedplyrdata-manipulation

Expand values of a column to n rows before and m rows after after the value in data frame


I have a data.frame representing different time series. In one column, I marked interesting time points (Note: There can be multiple interesting time points per Id):

Id Time Value Interesting
1 0 12 0
1 1 14 0
1 2 11 0
1 3 12 1
1 4 13 0
1 5 14 0
1 6 12 0
1 7 12 0
.. .. .. ..
78 128 13

Now, I would like to mark also n time points before and m points afterward as an interesting block. So if n = 2 and m = 3 I would expect this:

Id Time Value Interesting Block
1 0 12 0 0
1 1 14 0 1
1 2 11 0 1
1 3 12 1 1
1 4 13 0 1
1 5 14 0 1
1 6 12 0 1
1 7 12 0 0
.. .. .. .. ..
78 128 13 0 0

At the moment, I use a gaussianSmooth() and a threshold:

df %>% mutate(Block = ifelse(gaussianSmooth(Interesting, sigma = 4) > 0.001, 1, 0))

But this is cumbersome works and only works if n = m. Is there a “simpler” solution where I can easily set how many rows before and after should be changed. Solutions preferable in dplyr/tidyverse.


Solution

  • With group_modify (works for multiple Interesting values too). Get the indices you like: here the position when Interesting == 1, and then iteratively replace surrounding values with 1 (max(0, i - n):min(nrow(.x), i + m)).

    library(dplyr)
    n = 2
    m = 3
    
    df %>% 
      group_by(Id) %>% 
      group_modify(~ {
        idx <- which(.x$Interesting == 1)
        for(i in idx){
          .x$Interesting[max(0, i - n):min(nrow(.x), i + m)] <- 1
        }
        .x
      })
    
    # A tibble: 8 × 4
    # Groups:   Id [1]
         Id  Time Value Interesting
      <int> <int> <int>       <dbl>
    1     1     0    12           0
    2     1     1    14           1
    3     1     2    11           1
    4     1     3    12           1
    5     1     4    13           1
    6     1     5    14           1
    7     1     6    12           1
    8     1     7    12           0