Search code examples
rdataframeinterpolation

Interpolation for continuous data in R


I have a sample data as follows:

data1 <- read.table(text="1/1/12 1:48 AM  1.24
1/1/12 8:14 AM  0.26
1/1/12 2:01 PM  1.15
1/1/12 8:25 PM  0.15
1/2/12 2:36 AM  1.23
1/2/12 9:13 AM  0.25
1/2/12 2:54 PM  1.09
1/2/12 9:17 PM  0.16
1/3/12 3:28 AM  1.24
1/3/12 10:06 AM 0.21
1/3/12 3:52 PM  1.07
1/3/12 10:05 PM 0.15
1/4/12 4:21 AM  1.27
1/4/12 10:56 AM 0.16
1/4/12 4:49 PM  1.08
1/4/12 10:52 PM 0.12
1/5/12 5:12 AM  1.32
1/5/12 11:43 AM 0.1
1/5/12 5:41 PM  1.12
1/5/12 11:37 PM 0.08
1/6/12 5:58 AM  1.38
1/6/12 12:28 PM 0.03
1/6/12 6:27 PM  1.17
", sep="", header=F)

    > head(data1)
      V1   V2 V3   V4           date
1 1/1/12 1:48 AM 1.24 1/1/12 1:48 AM
2 1/1/12 8:14 AM 0.26 1/1/12 8:14 AM
3 1/1/12 2:01 PM 1.15 1/1/12 2:01 PM
4 1/1/12 8:25 PM 0.15 1/1/12 8:25 PM
5 1/2/12 2:36 AM 1.23 1/2/12 2:36 AM
6 1/2/12 9:13 AM 0.25 1/2/12 9:13 AM

Combine 3 columns to one to make data column

data1$date <- paste(data1$V1, data1$V2, data1$V3)

Create a date sequence to do the interpolation

daterange <- seq(from=as.POSIXct("2012-1-1 00:00"), to = as.POSIXct("2012-1-6 00:00"), length.out =1200)

I want to find the corresponding V4 values of the daterange specified above. I want to do the linear interpolation.


Solution

  • As others have said, you can use approx(...) to interpolate between successive points, although it's debatable if this is a good idea.

    data1$posix <- as.POSIXct(data1$date,format="%m/%d/%y %I:%M %p")
    df <- as.data.frame(with(data1,approx(posix,V4,n=1200)))  # colnames are "x", "y"
    colnames(df) <- c("date","V4")
    df$posix     <- as.POSIXct(df$date,origin="1970-01-01")
    
    library(ggplot2)
    ggplot()+
      geom_point(data=data1, aes(x=posix, y=V4), color="red", size=5)+
      geom_point(data=df,    aes(x=posix, y=V4), color="blue", size=1)+
      labs(x="Date")
    

    Note the format string in the call to as.POSIXct(...). You have to specify that the times are in 12hr format using %I (not %H), and you have to specify that the string contains AM/PM (using %p), or your character times will not convert correctly. (They will convert, though, without throwing an error - so be careful).