I'd like to create an application that is able to replay historical tick by tick multi-level book changes. My question is how does one potentially go about doing this?
The immediate problem I am facing is how to simulate actual data volume burst? Data burst here is defined as high volume of events occurring during a given time span (say microseconds). For example, if I just loop through the data event by event and publishing it to my consumers this would not take in to account the actual time differences between consecutive events occurring in real life. Since market data is coming in asynchronously, I need to be able to model that.
Any suggestions or resources is highly appreciated.
Thanks,
Given the stated definition of the requirements for such a high-fidelity emulator for a realistic flow of events needed for HFT validations, your design choices are not endless.
Given just one instrument ( GBDUSD ) can generate many thousands ( yes, many thousands ) of L3-DoM changes within the same [usec] ( yes, inside the same one single microsecond amount of time ), your design has to decide between two principal ways:
go a distributed-way ( using a set of remote resources ) so as to inject that immense flow of events in a sort-of-controlled flow-of-time manner, independently of your actual consumer ( localhost ) workload / spare processing capacities
go an FPGA-way ( using an internally connected PCIe hardware ), that can help inject the said flow-of-events, again, independently of the consumer operations
Either way, is by far not within an illusion of what a weekend Hackathon or co-sponsored forks may bring to life.
Availability & static scales of all the historical, time-stamped L3-DoM data + records on a flow-of-trade-events are not an issue. The high-fidelity of flow-of-time ( orchestrated on the whole grounds of the injected ingress of market-generated messages into the highly dynamic simulation with bi-directional flow of transactions ), being on sole consumer-side close to the hardware performance envelope, definitely is.
As Seymour CRAY has stated:
"Anyone can build a fast CPU. The trick is to build a fast system."
Building a system, that works slower than the real-time seems less demanding, but such approach will never again catch the market ( not speaking about running any optimisation strategy over some Hyperparameter-space, where additional adverse scaling of PSPACE | PTIME
, but more often EXPSPACE & EXPTIME
move any such attempt into a dead-end of the computability constraints to expect any result(s) within a reasonable amount of real time, available for running such a simulated / to-be-optimised system.