Search code examples
fix-protocolquickfixn

FIX Protocol: Receiving out of Sequence Message during retransmission causes loop in retransmission


I have a fix client using QuickFIX/n as the FIX layer.

If for some technical reason my client gets disconnected, the FIX server will continue sending messages until it notices the client is no more present (with heartbeat I assume).

When my client reconnects it will notice the gap on first message. For instance if my client last received message has SeqNuM=124 and upon reconnection the server sends SeqNum=152, it means the server sent messages from 125 to 151 before being aware of the disconnect.

My issue happens afterwards. My client sends a Resend request 34=2 with BeginSeqNo 7=125 and EndSeqNo=0 (give me everything). During this retransmission and before it finishes, the FIX server sends me new message with SeqNo=153

So what my clients get is:

- Disconnects with last message 124
- Reconnects 
- Receive 151
- Ask for Resend from 125 to 0 (everything after 125)
- Receive 125
- Receive 126
- Receive 127
- Receive 152 (35=8) <-- this makes the retransmission abort on my side
- Ask For resend from 128 to 0
---> if the number of message to resend is too high and new messages keep coming in
     my client never manages to get the full retransmission in one go.

When talking with the other party (responsible for the server), they say it's OK to continue sending new messages during retransmission and that I should cache them until retransmission is finished.

It seems like it's not the way QuickFIX/n implemented this (I found no option to handle this specific case) but when looking at FIX documentation I can't find any info about this cache procedure. I assume also that this cache procedure is quite complex as I should probably cache for a given time (otherwise I may wait forever for missing messages).

My question is simple: What is this cache procedure and where can I find specs about it? And, is this handled by QuickFIX libraries or should I implement something specific on top of it?


Solution

  • When digging a bit more we finally found out that the real issue was my client asking again and again for the same retransmission.

    For instance, if I'm 4000 sequence numbers away an I resend a retransmission message each time there is a sequence discrepency (let's say every 10 messages) I may end up asking 500 times for more than 1000+ messages in average.

    This generates a high tension on server side and only makes things worse.

    There is an option in QuickFIX/J which is also available in QuickFIX/N (but undocumented on this one): SendRedundantResendRequests. By setting it to false you make sure your client does not ask twice for the same retransmission. This greatly lowers the pressure on the server and eases the reconnection.