Mr. Ulrichs xts package is always phenomenal. I have always been using split for the ordinary 5 minutes, 15 minutes, 30 minutes splits. No problems ever.
Now I'm stuck.
#Setup test data
my.time <- seq(from = as.POSIXct('2000-01-01 00:00:00'),
to = as.POSIXct('2000-01-01 1:00:00'),
by = '1 sec')
my.data <- rep(10, length = length(my.time))
my.xts <- as.xts(my.data, order.by = my.time)
#Now splitting and checking endtimes of first split
tail((split(my.xts, f="minutes", k=20))[[1]])
tail((split(my.xts, f="minutes", k=30))[[1]])
#2000-01-01 00:19:59 10 #All good
#2000-01-01 00:29:59 10 #All good
tail((split(my.xts, f="minutes", k=22))[[1]])
#2000-01-01 00:11:59 10 #Hmmm, what am I missing. Expectimg 00:21:59
#As endpoints is used by split I also checked this behaviour
endpoints(my.xts, on="minutes", k=20)
#[1] 0 1200 2400 3600 3601 #All good
endpoints(my.xts, on="minutes", k=30)
#[1] 0 1800 3600 3601 #All good
endpoints(my.xts, on="minutes", k=22)
#[1] 0 720 2040 3360 3601 #Hmmm
Trying to understand this I dug further into the XTS code at https://github.com/joshuaulrich/xts/blob/master/src/endpoints.c
There I found that this is supposed to be more effective
c(0,which(diff(_x%/%on%/%k+1) != 0),NROW(_x))
So I tried this
which(diff(.index(my.xts) %/% 60 %/% 20 +1) != 0)
#[1] 1200 2400 3600 #As expected
which(diff(.index(my.xts) %/% 60 %/% 21 +1) != 0)
#[1] 1080 2340 3600 #Expecting 1260 2520...
which(diff(.index(my.xts) %/% 60 %/% 22 +1) != 0)
#[1] 720 2040 3360 #Expecting 1320 2640...
which(diff(.index(my.xts) %/% 60 %/% 23 +1) != 0)
#[1] 720 2100 3480 #Expecting 1380 2760...
which(diff(.index(my.xts) %/% 60 %/% 24 +1) != 0)
#[1] 1440 2880 #As expected
which(diff(.index(my.xts) %/% 60 %/% 30 +1) != 0)
#[1] 1800 3600 #As expected
This is where my brain overheated and I posted here instead. I'm sure there is something I'm simply missing, so I haven't posted this as a bug at Github. Please help to explain what is going on. Why am I not getting the expected results?
EDIT: So, a quick think and I'm guessing this has to do with that all functions base on the start of Unix time and using time base which is not divisible with one hour. Is this a correct lead in my understanding?
EDIT2: Posted my answer below after finally understanding how endpoints and split are supposed to work...
Of course, split
(and endpoints
) work as they are supposed to be working within xts. I.e. splitting with 1970-01-01 00:00:00 as the start point when deciding the intervals.
And yes, I was misusing split as an easy way to split at an arbitrary start point of an hour within my time series xts data.
Anyhow, I solved my little issue by writing this short function which "resets" the first timestamp to 1970-01-01 00:00:00.
## contuation from code snippet above
#So, I realised it was all about resetting the first time
#to 1970-01-01 00:00:0,
#so that I could do easily my "strange" splitting.
#Please note that this is not to be used for dates,
#only when working on hour, mins or secs
#and don't care that you break the index.
startMovedSplit <- function(x, ...) {
startIndex <- head(.index(x), n=1)
.index(x) <- .index(x) - startIndex
split(x, ...)
}
#New Try
tail((startMovedSplit(my.xts, f="minutes", k=20))[[1]])
tail((startMovedSplit(my.xts, f="minutes", k=30))[[1]])
#1970-01-01 00:19:59 10 #Good enough for my purposes
#1970-01-01 00:29:59 10 #Good enough for my purposes
tail((startMovedSplit(my.xts, f="minutes", k=22))[[1]])
#1970-01-01 00:21:59 10 #Good enough for my purposes
And the best part of all my misunderstanding of the xts library? I now know how to handle the ellipses (...)
within functions and calling of subfunctions!