I have two data sets with currency data prices (High, Low, Open and Close) on two different timeframes (1 hour and 5 minute).
I have price targets for each row of the 1 Hour timeframe data, and a target direction for the trade (either 1 or -1).
For trades with a direction of '1', I'm trying to find the first point on the 5 minute chart where the "low" is <= the target level. Conversely, for trades with a direction of '-1', I want to find the first point on the 5-minute chart where the "high" is >= the target trade level.
I've put the code below to demonstrate what I'm after.
The problem I'm having in practice is that I'm looking at time periods spanning 10 - 20 years, which makes the joins very slow. I've set the example below to be from 2016 to 2022 so it's not too slow on my computer, but if you extend it out to more than 10 years it starts to really become a pain.
This may be something that I have to live with, but I'm looking for guidance on two things:
Thanks heaps in advance, Phil
# Load required packages
library(data.table)
library(dplyr)
# Define timeframes for the data
Start <- as.POSIXct("2016-01-01 00:00:00")
End <- as.POSIXct("2022-01-01 23:55:00")
Hours <- floor(as.numeric(difftime(End,Start,units = "hours"))) + 1
Minutes <- floor(as.numeric(difftime(End,Start,units = "mins")) / 5) + 1
# Create the Hourly data
set.seed(123)
hourly_prices <- data.table(
datetime = seq(Start, End, by = "hour"),
open = rnorm(Hours, mean = 100, sd = 1),
high = rnorm(Hours, mean = 101, sd = 1),
low = rnorm(Hours, mean = 99, sd = 1),
close = rnorm(Hours, mean = 100, sd = 1),
Direction = sample(c(1,-1),Hours,replace = T)) %>%
.[,Target_price := ifelse(Direction == -1,rnorm(.N, mean = 104, sd = 1),rnorm(.N,mean = 97,sd = 1))]
# Create the 5-minute data
set.seed(456)
minute_prices <- data.table(
datetime = seq(Start, End, by = "5 min"),
open = rnorm(Minutes, mean = 100, sd = 1),
high = rnorm(Minutes, mean = 101, sd = 1),
low = rnorm(Minutes, mean = 99, sd = 1),
close = rnorm(Minutes, mean = 100, sd = 1),
Position = seq_len(Minutes))
# Join the two data.tables to find the first point at which price passes the target levels
hourly_prices[(Direction == 1),Location := minute_prices[.SD, on = .(datetime > datetime, low <= Target_price),mult = "first",x.Position]]
hourly_prices[(Direction == -1),Location := minute_prices[.SD, on = .(datetime > datetime, high >= Target_price),mult = "first",x.Position]]
One simple ~3x speed-up would be to subset the minute_prices
data.table
to include only those rows that are the cumulative min/max for the current hour (minus 1 second, since the join is datetime > datetime
):
dtM <- copy(minute_prices)
dtH <- copy(hourly_prices)
system.time({
dtH[(Direction == 1),Location := dtM[dtM[, low == cummin(low), as.integer(datetime - 1)%/%3600L][[2]]][.SD, on = .(datetime > datetime, low <= Target_price),mult = "first",x.Position]]
dtH[(Direction == -1),Location := dtM[dtM[, high == cummax(high), as.integer(datetime - 1)%/%3600L][[2]]][.SD, on = .(datetime > datetime, high >= Target_price),mult = "first",x.Position]]
})
#> user system elapsed
#> 6.77 0.01 6.81
system.time({
hourly_prices[(Direction == 1),Location := minute_prices[.SD, on = .(datetime > datetime, low <= Target_price),mult = "first",x.Position]]
hourly_prices[(Direction == -1),Location := minute_prices[.SD, on = .(datetime > datetime, high >= Target_price),mult = "first",x.Position]]
})
#> user system elapsed
#> 22.97 0.00 23.04
identical(dtH, hourly_prices)
#> [1] TRUE