Search code examples
rinterpolationsplinelinear-interpolation

Natural interpolation in R


I'm trying to interpolate data by using the spline function (stats R package).

Specifically, I have a dataset like the following one:

DATE    Y
01/01/2020  
02/01/2020  0.705547512
04/01/2020  0.760723591
06/01/2020  0.014017642
07/01/2020  
09/01/2020  0.579518616
10/01/2020  
12/01/2020  0.7747401
15/01/2020  0.289562464
19/01/2020      

I would like to learn how to interpolate missing data on the base of the other values (e.g. Y variable values for the January 1st, January 7th,...). The aim is to populate such missing data; in order to do that, browsing on the internet, I found the spline R function that should do this task.

Can someone help me to compute the interpolated data? Thanks in advance.

So, I tried to implement the following R code in order to interpolate missing data.

SPLINE<- spline(x=df[2],
       y=df[1],
       method = "natural")$y

The outcome is a numeric vector with 3 record; all of them are equal to 10. I don't understand the ratio behind this kind of interpolation since I expected a vector with 10 record and all observations equal to the original Y variable except for the record corresponding the 2020-01-07, 2020-01-10 and 2020-01-19 that were missing and the spline function populates with the selected method.


Solution

  • It's difficult to tell what your problem is because your data is not reproducible. Are those really empty cells in your data frame? A numeric column can't have empty cells - they would have to be NA values. If they look empty when you print the data frame, then it is a character column and must be converted to numeric, or else spline won't work. Also, are those real date objects, or are they just character strings that represent dates? Again, if they are character strings, spline won't work.

    Let's take your example data as given:

    df <- read.table(text = "
    DATE    Y
    01/01/2020  ''
    02/01/2020  0.705547512
    04/01/2020  0.760723591
    06/01/2020  0.014017642
    07/01/2020  ''
    09/01/2020  0.579518616
    10/01/2020  ''
    12/01/2020  0.7747401
    15/01/2020  0.289562464
    19/01/2020  ''
    ", header = TRUE)
    

    Now we convert to the correct formats:

    df$DATE <- as.Date(df$DATE, format = '%d/%m/%Y')
    df$Y    <- as.numeric(df$Y)
    

    Following which, spline works just fine. Let's use it to generate a smooth line consisting of 100 points:

    SPLINE  <- spline(x = df$DATE, y = df$Y, n = 100, method = 'natural')
    
    plot(df$DATE, df$Y, ylim = c(-0.1, 1))
    lines(SPLINE$x, SPLINE$y)
    

    Created on 2023-09-01 with reprex v2.0.2