I have a time series dataset with around 120,000 rows, which I am storing as a data frame. Most of the data is at 15 minute interval, but there is some monthly data also. I want to keep only the 15 minute data and eliminate the data at monthly interval. So I am calculating the difference between consecutive timestamp and then eliminating everything not equal to 15 minutes (900 seconds). My timestamp column name is 'datetime'. I am using the following to calculate the time interval-
site_data[1:nrow(site_data)-1,"Interval"] <- as.numeric(difftime(site_data[2:nrow(site_data),"DateTime"],
site_data[1:nrow(site_data)-1,"DateTime"]))
But this code is taking too long to run. Is there a faster alternative to difftime? The timestamp column is POSIXct type date-time. Thank you.
Just use diff(as.numeric(timeCol))
:
R> library(microbenchmark)
R> times <- Sys.time() + 1:1e5
R> microbenchmark(diff(times), diff(as.numeric(times)))
Unit: microseconds
expr min lq mean median uq max neval cld
diff(times) 1653.999 2153.82 8871.00 2407.66 5313.88 41223.4 100 b
diff(as.numeric(times)) 774.058 1215.35 3910.26 1456.82 1846.53 35622.2 100 a
R>
Not a huge difference but about a factor of two in the mean.