Search code examples
node.jsnetwork-programmingudpmulticastalgorithmic-trading

Parsing TCP payloads using a custom spec


My goal is to create a parser for TCP packets that are using a custom spec from the Options Price Reporting Authority found here but I have no idea where to start. I've never worked with low-level stuff and I'd appreciate if I got some guidance.

The huge problem is I don't have access to the actual network because it costs a huge sum per month and all I can work off is the specification. I don't even if it's possible. Do you step by step parse each byte and hope for the best? Do you first re-create some example data using the bytes in the spec and then parse it? Isn't that also difficult since (I think) that TCP spread the data to multiple blocks?


Solution

  • That's quite an elaborate data feed. A quick review of the spec shows that it contains enough information to write a program in either nodejs or golang to ingest it.

    Getting it to work will be a big job. Your question didn't mention your level of programming skill, or of your network engineering skill. So it's hard to guess how much learning lies ahead of you to get this done.

    A few things.

    1. It's a complex enough protocol that you will need to test it with correctly formatted sample data. You need a fairly large collection of sample packets in order to mock your data feed (that is, build a fake data feed for testing purposes). While nothing is impossible, it will be very difficult to build a bug-free program to handle this data without extensive tests.

      If you have a developer relationship to the publisher of the data feed, you should ask if they offer sample data for testing.

    2. It is not a TCP / IP data feed. It is an IP multicast datagram feed. In IP multicast feeds you set up a server to listen for the incoming data packets. They use multicast to achieve the very low latencies necessary for predatory algorithmic trading.

      • You won't use TCP sockets to receive it, you'll use a different programming interface called UDP datagrams
      • If you're used to TCP's automatic recovery from errors, datagrams will be a challenge. With datagrams you cannot tell if you failed to receive data except by looking at sequence numbers. Most data feeds using IP and multicast have some provision for retransmitting data. Your spec is no exception. You must handle retransmitted data correctly or it will look like you have lots of duplicate data.
      • Multicast data doesn't move over the public network. You'll need a virtual private network connection to the publisher, or to co-locate your servers in a data center where the feed is available on an internal network.
      • There's another, operational, spec you'll need to cope with to get this data. It's called the Common IP Multicast Distribution Network Recipient Interface Specification. This spec has a primer on the multicast dealio.

    You can do this. When you have made it work, you will have gained some serious skills in network programming and network engineering.

    But if you just want this data, you might try to find a reseller of the data that repackages it in an easier-to-consume format. That reseller probably also imposes a delay on the data feed.