Search code examples
financefftfingerprintdatafeed

How can I compare market data feed sources for quality and latency improvement?


I am in the very first stages of implementing a tool to compare 2 market data feed sources in order to prove the quality of new developed sources to my boss ( meaning there are no regressions, no missed updates, or wrong ), and to prove latencies improvement.

So the tool I need must be able to check updates differences as well as to tell which source is the best (in term of latency).

Concrectly, reference source could be Reuters while the other one is a Feed handler we develop internally. People warned me that updates might not arrive in the same order as Reuters implementation could differs totally from ours. Therefore a simple algorithm based on the fact that updates could arrive in the same order is likely not to work.

My very first idea would be to use fingerprint to compare feed sources, as Shazaam application does to find the title of the tube you are submitting. Google told me it is based on FFT. And I was wondering if signal processing theory could behaves well with market access applications.

I wanted to know your own experience in that field, is that possible to develop a quite accurate algorithm to meet the needs? What was your own idea? What do you think about fingerprint based comparison?


Solution

  • If the exchange that provides the data has some unique identifier for the data it provides the implementation is fairly straightforward, but not trivial.

    In essence you have an app that subscribes to the two feeds. (you can do this with sniff-based software as well for non-intrusive monitoring/measurement - I can try to address that as well)

    You would keep two lists (or any other method of noting "unmatched" samples from each feed) of unmatched data/updates. As each update comes in you look for the corresponding item in the other list from the other data feed. When you successfully match you can save this pairing. When each update comes in you have to somehow assign it a "time stamp" - likely the local machine time. Since the origin in this simple case is the same exchange determining relative latency is fairly easy.

    This method requires writing subscribing apps for the data.

    There are lots of issues such as handling missing updates and timing out unmatched data, how to handle exchanges or feeds that might not provide unique ides for updates, working around data vendors mistakes WRT local vs UTC time, etc.

    Sniffing the data is similar but you'd capture the data through pcap or hardware capture cards and then parse the streams based on the endpoints of the packets. This is a bit more difficult than straight subscription but has the advantage of being non-intrusive and fairly flexible about what sets of data you can measure.