Search code examples
rinterpolation

Interpolation between 2 points for a specified datetime values


I'm new to R. I've a CSV file with the data as shown below. It has 4000+ rows of data. I'm not able to figure out how to feed the Timestamp data for Approx function. Below is my code.

library(ggplot2)
library(pmml) 
library(XML)
library(gmodels)
library(zoo)
library("data.table") 

df <- fread("C:/Users/myprofile/Desktop/test logs/test1.csv",
                  select = c("Timestamp", "Var1"))

head(df)

df[['Timestamp']] <- as.POSIXct(df[['Timestamp']],
                                  format = "%Y %m %d %H:%M:%S:%OS")

seq1 <- zoo(order.by=(as.POSIXlt(seq(min(df$Timestamp), max(df$Timestamp), by=5))))

I'm not sure how to use the Approx() function for the Timestamp data. Please help how to interpolate for "Var1" at any 2 points, for the kind of data that I have.

I get this error

seq1 <- zoo(order.by=(as.POSIXlt(seq(min(df$Timestamp), max(df$Timestamp), by=5)))) Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

 dput(df)
structure(list(Timestamp = structure(c(1594146600, 1594146609, 
1594146610, 1594146612, 1594146613, 1594146614, 1594146615, 1594146616, 
1594146618, 1594146619, 1594146620, 1594146640, 1594146660, 1594146681, 
1594146701, 1594146721, 1594146741, 1594146761, 1594146782), class = c("POSIXct", 
"POSIXt"), tzone = "")), row.names = c(NA, -19L), .internal.selfref = <pointer: 0x000002aac2681ef0>, class = c("data.table", 
"data.frame"))

structure(list(Timestamp = structure(c(1594146600, 1594146609, 
1594146610, 1594146612, 1594146613, 1594146614, 1594146615, 1594146616, 
1594146618, 1594146619, 1594146620, 1594146640, 1594146660, 1594146681, 
1594146701, 1594146721, 1594146741, 1594146761, 1594146782), class = c("POSIXct", 
"POSIXt"), tzone = ""), Var1 = c(-0.02, -0.02, -0.01, 0.26, 0.48, 
0.63, 0.75, 0.86, 0.97, 1.2, 2.27, 4, 4.3, 3.02, 2.23, 1.79, 
1.62, 1.59, 1.63)), row.names = c(NA, -19L), class = "data.frame")

enter image description here


Solution

  • 1) Using the last dput output shown in the question (also shown in the Note at the end) we create sequential date/times, seq1, fixing code in question and then use approx converting the resulting list to a data frame and then read that data frame into a zoo object.

    library(zoo)
    
    rng <- range(df$Timestamp)
    seq1 <- seq(rng[1], rng[2], 5)
    
    dfi <- with(df, data.frame(approx(Timestamp, Var1, seq1)))
    z <- read.zoo(dfi)
    

    2) Alternately use na.approx. seq1 is from (1) above. Here we create a zoo object from df and merge it with a zero width object with the time grid. That will introduce NAs which can be filled with na.approx. Then extract the values at the grid.

    library(zoo)
    
    zz <- read.zoo(df)
    z0 <- zoo(, seq1)
    na.approx(merge(zz, z0))[time(z0)]
    

    Note

    df <-
    structure(list(Timestamp = structure(c(1594146600, 1594146609, 
    1594146610, 1594146612, 1594146613, 1594146614, 1594146615, 1594146616, 
    1594146618, 1594146619, 1594146620, 1594146640, 1594146660, 1594146681, 
    1594146701, 1594146721, 1594146741, 1594146761, 1594146782), class = c("POSIXct", 
    "POSIXt"), tzone = ""), Var1 = c(-0.02, -0.02, -0.01, 0.26, 0.48, 
    0.63, 0.75, 0.86, 0.97, 1.2, 2.27, 4, 4.3, 3.02, 2.23, 1.79, 
    1.62, 1.59, 1.63)), row.names = c(NA, -19L), class = "data.frame")