Say I have the following sample minute data.
> data = xts(1:12, as.POSIXct("2020-01-01")+(1:12)*60*20)
> data
[,1]
2020-01-01 00:20:00 1
2020-01-01 00:40:00 2
2020-01-01 01:00:00 3
2020-01-01 01:20:00 4
2020-01-01 01:40:00 5
2020-01-01 02:00:00 6
2020-01-01 02:20:00 7
2020-01-01 02:40:00 8
2020-01-01 03:00:00 9
2020-01-01 03:20:00 10
2020-01-01 03:40:00 11
2020-01-01 04:00:00 12
This already aligned minute data, but now I want to get hourly.
Easy, just use the to.hourly command right?
> to.hourly(data)
data.Open data.High data.Low data.Close
2020-01-01 00:40:00 1 2 1 2
2020-01-01 01:40:00 3 5 3 5
2020-01-01 02:40:00 6 8 6 8
2020-01-01 03:40:00 9 11 9 11
2020-01-01 04:00:00 12 12 12 12
The problem is that it puts the end values of each bar into the next bar, and the last value is creates its own hour period.
Now to only show correct hourly bars I use align.time.
> align.time(to.hourly(data),60*60)
data.Open data.High data.Low data.Close
2020-01-01 01:00:00 1 2 1 2
2020-01-01 02:00:00 3 5 3 5
2020-01-01 03:00:00 6 8 6 8
2020-01-01 04:00:00 9 11 9 11
2020-01-01 05:00:00 12 12 12 12
The previous last entry creates its own hour bar which I need to remove.
The same issue occurs if I convert to daily, the last enry goes to the next day and an extra day is created.
The question is how to convert to different periods correctly?
The desired result for the example is:
data.Open data.High data.Low data.Close
2020-01-01 01:00:00 1 3 1 3
2020-01-01 02:00:00 4 6 4 6
2020-01-01 03:00:00 7 9 7 9
2020-01-01 04:00:00 10 12 10 12
This seems like a very basic option and I have searched and found many examples, but not one that considers the last value in a period. Thank you.
UPDATE: Allan Cameron gave a fantastic answer and it absolutely works, I am just concerned that it will fail at some point with different time periods.
My workflow starts with tick data which I convert to second and minute and so on. Converting tick to higher periods would work perfectly, but it is too much data to handle at once, hence the staggered approach. That is why the aligned data needs to work with any period conversion.
I made as small modification to Allan's code:
setNames(shift.time(to.hourly(shift.time(data, -.0000001193)), .0000001193), c("Open", "High", "Low", "Close"))
.0000001193 was the smallest value I found to work with simple trial and error.
Is there any time where this would not work or would the min value be different?
Is this the best way to handle this issue?
Thank you.
You can shift the time back 60 seconds, do as.hourly
, then shift the time forward 60 seconds. This maintains the groupings. You'll need to rename the columns too:
setNames(shift.time(to.hourly(shift.time(data, -60)), 60), c("Open", "High", "Low", "Close"))
#> Open High Low Close
#> 2020-01-01 01:00:00 1 3 1 3
#> 2020-01-01 02:00:00 4 6 4 6
#> 2020-01-01 03:00:00 7 9 7 9
#> 2020-01-01 04:00:00 10 12 10 12