I'm new to R. I've a CSV file with the data as shown below. It has 4000+ rows of data. I'm not able to figure out how to feed the Timestamp data for Approx function. Below is my code.
library(ggplot2)
library(pmml)
library(XML)
library(gmodels)
library(zoo)
library("data.table")
df <- fread("C:/Users/myprofile/Desktop/test logs/test1.csv",
select = c("Timestamp", "Var1"))
head(df)
df[['Timestamp']] <- as.POSIXct(df[['Timestamp']],
format = "%Y %m %d %H:%M:%S:%OS")
seq1 <- zoo(order.by=(as.POSIXlt(seq(min(df$Timestamp), max(df$Timestamp), by=5))))
I'm not sure how to use the Approx() function for the Timestamp data. Please help how to interpolate for "Var1" at any 2 points, for the kind of data that I have.
I get this error
seq1 <- zoo(order.by=(as.POSIXlt(seq(min(df$Timestamp), max(df$Timestamp), by=5)))) Error in seq.int(0, to0 - from, by) : 'to' must be a finite number
dput(df)
structure(list(Timestamp = structure(c(1594146600, 1594146609,
1594146610, 1594146612, 1594146613, 1594146614, 1594146615, 1594146616,
1594146618, 1594146619, 1594146620, 1594146640, 1594146660, 1594146681,
1594146701, 1594146721, 1594146741, 1594146761, 1594146782), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, -19L), .internal.selfref = <pointer: 0x000002aac2681ef0>, class = c("data.table",
"data.frame"))
structure(list(Timestamp = structure(c(1594146600, 1594146609,
1594146610, 1594146612, 1594146613, 1594146614, 1594146615, 1594146616,
1594146618, 1594146619, 1594146620, 1594146640, 1594146660, 1594146681,
1594146701, 1594146721, 1594146741, 1594146761, 1594146782), class = c("POSIXct",
"POSIXt"), tzone = ""), Var1 = c(-0.02, -0.02, -0.01, 0.26, 0.48,
0.63, 0.75, 0.86, 0.97, 1.2, 2.27, 4, 4.3, 3.02, 2.23, 1.79,
1.62, 1.59, 1.63)), row.names = c(NA, -19L), class = "data.frame")
1) Using the last dput output shown in the question (also shown in the Note at the end) we create sequential date/times, seq1
, fixing code in question and then use approx
converting the resulting list to a data frame and then read that data frame into a zoo object.
library(zoo)
rng <- range(df$Timestamp)
seq1 <- seq(rng[1], rng[2], 5)
dfi <- with(df, data.frame(approx(Timestamp, Var1, seq1)))
z <- read.zoo(dfi)
2) Alternately use na.approx
. seq1
is from (1) above. Here we create a zoo object from df
and merge it with a zero width object with the time grid. That will introduce NAs which can be filled with na.approx
. Then extract the values at the grid.
library(zoo)
zz <- read.zoo(df)
z0 <- zoo(, seq1)
na.approx(merge(zz, z0))[time(z0)]
df <-
structure(list(Timestamp = structure(c(1594146600, 1594146609,
1594146610, 1594146612, 1594146613, 1594146614, 1594146615, 1594146616,
1594146618, 1594146619, 1594146620, 1594146640, 1594146660, 1594146681,
1594146701, 1594146721, 1594146741, 1594146761, 1594146782), class = c("POSIXct",
"POSIXt"), tzone = ""), Var1 = c(-0.02, -0.02, -0.01, 0.26, 0.48,
0.63, 0.75, 0.86, 0.97, 1.2, 2.27, 4, 4.3, 3.02, 2.23, 1.79,
1.62, 1.59, 1.63)), row.names = c(NA, -19L), class = "data.frame")