Search code examples
rdplyrinterpolationmissing-dataimputation

Interpolate zero values only if one zero and surrounding values are bigger than zero


I want to interpolate zero values in a time series dataframe but only if: 1) there is only one missing value so subsequent and proceeding values are non-zero, 2) the surrounding non-zero values are higher than 2.

See example below. Thanks a lot!

df = data.frame(row.names = c("year 1", "year 2", "year 3", "year 4", "year 5"), 
     person1 = c(33,0,8,6,3), person2 = c(1,3,0,0,5), person3 = c(0,3,0,5,1))

before:

       person1 person2 person3
year 1      33       1       0
year 2       0       3       3
year 3       8       0       0
year 4       6       0       2
year 5       3       5       1

desired result:

       person1 person2 person3
year 1      33       1       0
year 2     *20.5*    3       3
year 3       8       0       0
year 4       6       0       2
year 5       3       5       1

I have tried different variations of lead & lag mutations and the na.approx function but to no avail.


Solution

  • Here's a one-liner using the tidyverse:

    library(tidyverse)
    
    df <- data.frame(row.names = c("year 1", "year 2", "year 3", "year 4", "year 5"), person1 = c(33,0,8,6,3), person2 = c(1,3,0,0,5), person3 = c(0,3,0,5,1))
    
    df |> 
      mutate(
        across(everything(), \(x) ifelse(x == 0 & lag(x, default = 0) > 2 & lead(x, default = 0) > 2, rowMeans(cbind(lag(x), lead(x))), x))
      )
      
           person1 person2 person3
    year 1    33.0       1       0
    year 2    20.5       3       3
    year 3     8.0       0       4
    year 4     6.0       0       5
    year 5     3.0       5       1