Search code examples
rfor-loopdplyrconditional-statements

R dataframe: filling specific number of rows above and below where conditions are met


I am working in R, I have a dataframe with a DateTime column and a Binary column marking an event in time, as in the sample dataframe:

DateTime <- seq(from = as.POSIXct("2021-01-01 00:00:00"), to = as.POSIXct("2021-01-01 17:00:00"), by = "hour")
Binary <- c(NA, 1, rep(NA, 5), 1, rep(NA, 5), 1, rep(NA, 4))
sample <- data.frame(DateTime, Binary)

I want to create a new column, assigning 'H' where '1' is indicated in the Binary column, as well as x number of rows above and below where '1' is indicated. For the sake of this example 1 row above and below, as shown in the 'goal' dataframe:

Height <- c(rep('H', 3), rep(NA, 3), rep('H',3), rep(NA, 3), rep('H', 3), rep(NA, 3))
goal <- data.frame(DateTime, Binary, Height)

I can achieve this using a for loop. However, it is very slow as the actual dataset I have is very large (almost 1 million observations). Here is an example of the for loop I was using:

# create new column Height
sample$Height <- NA
# Use a for loop to assign H
  for (i in 1 : length(sample$Height)){
    if(sample$Binary[i] %in% c(1)){sample$Height [i] <- "H"}
    if(sample$Binary[i] %in% c(1)){sample$Height [i+1] <- "H"}
    if(sample$Binary[i] %in% c(1)){sample$Height [i-1] <- "H"}
  } 

I can use dplyr to assign 'H' in the rows where there is a '1' in the Binary column.

sample <- sample %>%
  mutate(Height = ifelse(sample$Binary==1,'H', NA))

However, is there a way to fill a specified number of rows above and below this too?

I also considered using fill() following the above step:

sample <- fill(sample$Height, .direction="updown")

But of course, this fills all NA's, which I do not want...


Solution

  • DateTime <- seq(from = as.POSIXct("2021-01-01 00:00:00"), to = as.POSIXct("2021-01-01 17:00:00"), by = "hour")
    Binary <- c(NA, 1, rep(NA, 5), 1, rep(NA, 5), 1, rep(NA, 4))
    sample <- data.frame(DateTime, Binary)
    
    Height <- c(rep('H', 3), rep(NA, 3), rep('H',3), rep(NA, 3), rep('H', 3), rep(NA, 3))
    goal <- data.frame(DateTime, Binary, ExpectedHeight = Height)
    
    library(dplyr)    
    
    goal %>% 
      mutate(
        Height = case_when(
          Binary | lag(Binary) | lead(Binary)  == 1 ~ "H",
          TRUE ~ NA_character_
        )
      )
    
                  DateTime Binary ExpectedHeight Height
    1  2021-01-01 00:00:00     NA              H      H
    2  2021-01-01 01:00:00      1              H      H
    3  2021-01-01 02:00:00     NA              H      H
    4  2021-01-01 03:00:00     NA           <NA>   <NA>
    5  2021-01-01 04:00:00     NA           <NA>   <NA>
    6  2021-01-01 05:00:00     NA           <NA>   <NA>
    7  2021-01-01 06:00:00     NA              H      H
    8  2021-01-01 07:00:00      1              H      H
    9  2021-01-01 08:00:00     NA              H      H
    10 2021-01-01 09:00:00     NA           <NA>   <NA>
    11 2021-01-01 10:00:00     NA           <NA>   <NA>
    12 2021-01-01 11:00:00     NA           <NA>   <NA>
    13 2021-01-01 12:00:00     NA              H      H
    14 2021-01-01 13:00:00      1              H      H
    15 2021-01-01 14:00:00     NA              H      H
    16 2021-01-01 15:00:00     NA           <NA>   <NA>
    17 2021-01-01 16:00:00     NA           <NA>   <NA>
    18 2021-01-01 17:00:00     NA           <NA>   <NA>