Search code examples
rdplyrinterpolationpanel

Interpolation of panel data based on a flag column in r


I have a time-series panel dataset which is structured in the following way:

df <- data.frame(
  year = c(2012L, 2013L, 2014L, 2012L, 2013L, 2014L),
  id = c(1L, 1L, 1L, 2L, 2L, 2L),
  c = c(11L, 13L, 13L, 16L, 15L, 15L),
flag = c(FALSE, TRUE, FALSE, FALSE, FALSE, FALSE)
)

#>   year id  c flag
#> 1 2012  1 11 FALSE
#> 2 2013  1 26 TRUE
#> 3 2014  1 13 FALSE
#> 4 2012  2 16 FALSE
#> 5 2013  2 15 FALSE
#> 6 2014  2 19 FALSE

I want to do a linear interpolation where the value in column c is interpolated by averaging the value before and after it whenever there is a corresponding TRUE value in the flag column. e.g. since the value 26 has a TRUE flag I want it to be replaced with the average of 11 and 13 which is 12. I tried to write a function but I have a bit of trouble implementing it properly in the panel data setting.


Solution

  • You can replace c values where flag is TRUE with NA and then interpolate values with na.approx from zoo.

    library(dplyr)
    
    df %>% mutate(c = zoo::na.approx(replace(c, flag, NA)))
    
    #  year id  c  flag
    #1 2012  1 11 FALSE
    #2 2013  1 12  TRUE
    #3 2014  1 13 FALSE
    #4 2012  2 16 FALSE
    #5 2013  2 15 FALSE
    #6 2014  2 15 FALSE