Search code examples
rinterpolationspline

Interpolate with splines without surpassing next value R


I have a dataset of accumulated data. I am trying to interpolate some missing values but at some points I get a superior value. This is an example of my data:

dat <- tibble(day=c(1:30),
              value=c(278, 278, 278, NA, NA, 302, 316, NA, 335, 359, NA, NA,
                      383, 403, 419, 419, 444, NA, NA, 444, 464, 487, 487, 487, 
                       NA, NA, 487, 487, 487, 487))

My dataset is quite long and when I use smooth.spline to interpolate the missing values I get a value greater than the next observation, which is quite aabsurd considering I am dealing with accumulated data. This is the output I get:

value.smspl <- c(278, 278, 278, 287.7574, 295.2348, 302, 316, 326.5689, 335, 
359, 364.7916, 377.3012, 383, 403, 419, 419, 444, 439.765, 447.1823, 
444, 464, 487, 487, 487, 521.6235, 526.3715, 487, 487, 487, 487)

smooth.spline

My question is: can you somehow set boundaries for the interpolation so the result is reliable? If so, how could you do it?


Solution

  • You have monotonic data for interpolation. We can use "hyman" method in spline():

    x <- dat$day
    yi <- y <- dat$value
    naInd <- is.na(y)
    yi[naInd] <- spline(x[!naInd], y[!naInd], xout = x[naInd], method = "hyman")$y
    
    plot(x, y, pch = 19)  ## non-NA data (black)
    points(x[naInd], yi[naInd], pch = 19, col = 2)  ## interpolation at NA (red)
    

    spline


    Package zoo has a number of functions to fill NA values, one of which is na.spline. So as G. Grothendieck (a wizard for time series) suggests, the following does the same:

    library(zoo)
    library(dplyr)
    dat %>% mutate(value.interp = na.spline(value, method = "hyman"))