I want ot download and save quote market data from Dukascopy: https://www.dukascopy.com/swiss/english/marketwatch/historical/
I have made succesfull request on their server and get binary data as result of GET request:
library(httr)
url <- "https://datafeed.dukascopy.com/datafeed/USA30IDXUSD/2022/07/15/23h_ticks.bi5"
p <- GET(url,
add_headers("user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"))
p
content(p)
I don't know how to convert this binary to readable data. I have found some scripts in python: https://www.driftinginrecursion.com/post/dukascopy_opensource_data/ https://github.com/terukusu/download-tick-from-dukascopy/blob/master/download_tick_from_dukascopy.py
For example, first link uses function:
def bi5_to_csv(date_ts, out_dir, files):
print('Starting Coversion of All .bi5 Files...')
sort = sorted(files)
chunk_size = struct.calcsize('>3i2f')
data = []
for bi5 in sort:
try:
size = os.path.getsize(bi5)
except (IOError, OSError):
break
if size > 0:
with lzma.open(bi5) as f:
while True:
chunk = f.read(chunk_size)
if chunk:
data.append(struct.unpack('>3i2f', chunk))
else:
break
os.remove(bi5)
if not data:
print('All Downloaded Files Where Empty!')
return 1
df = pd.DataFrame(data)
df.columns = ['UTC', 'AskPrice', 'BidPrice', 'AskVolume', 'BidVolume']
df.AskPrice = df.AskPrice / 100000
df.BidPrice = df.BidPrice / 100000
df.UTC = pd.TimedeltaIndex(df.UTC, 'ms')
df.UTC = df.UTC.astype(str)
df.UTC = df.UTC.replace(regex=['0 days'], value=[str(date_ts)])
df.UTC = df.UTC.str[:-3]
df.to_csv(out_dir + '/daily.csv', index=False)
print('Finished Converting Files!')
return 0
and in second scripts it uses:
def tokenize(buffer):
token_size = 20
token_count = int(len(buffer) / token_size)
tokens = list(map(lambda x: struct.unpack_from('>3I2f', buffer, token_size * x), range(0, token_count)))
return tokens
def normalize_tick(symbol, day, time, ask, bid, ask_vol, bid_vol):
date = day + timedelta(milliseconds=time)
# TODO 網羅する。この通過ペア以外も有るかも
if any(map(lambda x: x in symbol.lower(), ['usdrub', 'xagusd', 'xauusd', 'jpy'])):
point = 1000
else:
point = 100000
I don't know how to appl this code in E and in general how to convert this binary file to tick data. have also found C++ implementation: Reading data from Dukascopy tick binary file
My weird attempt:
Reading a bit about bi5
files gives us an information, that's binary files archived with lzma
(see: reedit. So, first we have to decompress the file. To decompress it I'm using lzma
.
library(httr)
binF <- "/home/sapi/aaa/23h_ticks.lzma"
url <- "https://datafeed.dukascopy.com/datafeed/USA30IDXUSD/2022/07/15/23h_ticks.bi5"
p <- GET(url,
add_headers("user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"), write_disk(binF, overwrite = TRUE))
system(paste("lzma -d", binF))
Having it decompressed we have to read in (as binary) and convert to any meaningful data. From the reddit discussion above we can see, that:
The data is stored in 20 byte wide rows, with each 4 byte segment corresponding to a piece of data. Example:
# TIME is a 32-bit big-endian integer representing the number of milliseconds that have passed since the beginning of this hour.
# ASKP is a 32-bit big-endian integer representing the asking price of the pair, multiplied by 100,000.
# BIDP is a 32-bit big-endian integer representing the bidding price of the pair, multiplied by 100,000.
# ASKV is a 32-bit big-endian floating point number representing the asking volume, divided by 1,000,000.
# BIDV is a 32-bit big-endian floating point number representing the bidding volume, divided by 1,000,000.
Now we have to find a way to convert the big-endian to something. And this stackoverflow question gives us a hint.
library(dplyr)
binF <- "/home/sapi/aaa/23h_ticks"
con <- file(binF, "rb")
for (i in 0:(file.size(binF)/20-1)) {
data <- readBin(con = con, "raw", 20)
TIME <- data[4:1] %>% rawToBits %>% as.logical %>% which %>% {2^(. - 1)} %>% sum
ASKP <- data[8:5] %>% rawToBits %>% as.logical %>% which %>% {2^(. - 1)} %>% sum
BIDP <- data[12:9] %>% rawToBits %>% as.logical %>% which %>% {2^(. - 1)} %>% sum
ASKV <- data[16:13] %>% rawToBits %>% as.logical %>% which %>% {2^(. - 1)} %>% sum
BIDV <- data[20:17] %>% rawToBits %>% as.logical %>% which %>% {2^(. - 1)} %>% sum
print(paste(TIME, ASKP, BIDP, ASKV, BIDV))
}
#> [1] "246 33867231 33862497 946528651 943230116"
#> [1] "296 33867291 33861749 947628162 943230116"
#> [1] "421 33867003 33862299 897988541 943230116"
#> [1] "472 33867781 33862219 947628162 943230116"
#> [1] "2440 33867773 33862779 897988541 943230116"
[...]
#> [1] "3583902 33882501 33877289 947628162 943230116"
Please note I have no idea if the data are correct/meaningful. However looking on milliseconds column it makes sense, as it ends around 3600.
Created on 2022-09-05 with reprex v2.0.2