I'm pulling open positions data off an exchange API. Their websocket server is quite unstable and we often miss websocket updates. It's quite important to me to not miss anything. The problem is that I can't guarantee an "at least once" delivery without some sort of an ACK-NACK scheme between my server and theirs, which is not possible.
I was thinking of the following statement:
If we missed any websocket updates, we can fall back to HTTP at some point.
A single request to their REST returns the status of all open positions, which is reliable. Perhaps the websocket and the REST should somehow play together in order to achieve that.
Backpressure is the second issue, which is easily solved by Akka.NET, but if you suggest a different solution, I'd like to know how to solve it.
What are you guys going to use in such situations?
websocket server is quite unstable
The 3rd party system seems to be a likely culprit. There's nothing you can do if they don't send the message. But, let's assume that they send all messages that they should...
From what you've stated the WebSocket stays alive and you can receive messages after a message is missed. Since WebSockets operate over TCP/IP, any messages which do not receive an ACK from the receiving end would cause the connection to fail before subsequent messages would arrive. In C#/.Net this would eventually manifest itself as a connection Exception (time out, connection closed, etc.). That seems to rule out network transport issues.
On the receiving side, there are a couple of libraries that might be used to handle websockets. Without seeing the "message received" code (or even knowing which library is used) it's difficult to know if messages are being processed correctly. That code deserves close analysis for threading issues in particular.
The most likely problem, though, is that the 3rd party system is sending batches of data, and not just one message at a time. On the surface it may appear that only one value ("position") is being sent, but there could be multiple updates sent together. Using Fiddler (or by logging the entire raw payload received) might help determine if this is the case.
What are you guys going to use in such situations?
If the 3rd party system is shown to be the problem, in addition to the websocket it seems there's no other option than to poll their system using the REST endpoint as frequently as their documentation says you can (if not stated, you might need to contact their engineering team to make sure you don't end up causing a friendly DOS attack).
If their system is properly sending all updates, you might want to tighten the timeouts and more quickly close/reopen the websocket connection and then hit the REST endpoint to minimize out-of-date data.