Search code examples
websocketspring-websocketsockjs

How is SockJS selecting the protocol?


I will first start with what I am aware of, as it may be useful for the rest.

It is clear that you can instantiate SockJS with a subset of protocols:

sockJsProtocols = ["xhr-streaming", "xhr-polling", ....];
socket.cliente = new SockJS(url, null, {transports: sockJsProtocols}));

I also found out that when SockJS is calling the /info endpoint, server is returning websockets true or false (depending on server compatibility).

I am also aware of the nice table here with 3 columns: websockets, streaming and polling.

But I have 2 questions:

1) How is SockJS deciding for polling, streaming or websockets? It can't be on browser compatibility, since I see in our logs lots of XHRStreaming sessions with recent browsers. How is that this is not documented anywhere?

2) Why does SockJS need to be calling /info every time? the compatibility of the server will be the same always.


Solution

  • Ok, I had to read the sourcecode of SockJS client to understand how it works. Now I do. This answer is valid, at least for SockJS 1.3.

    SockJS is calculating the RTT (Round-Trip-Time) of the /info call. Then it takes this time and calculates an RTO (Retransmittion Timeout). If the RTT is less than 100ms, then it simply adds 300ms to it, otherwise, it multiplies it by 4.

    The RTO is then multiplied by the number of roundtrips of each protocol, which apparently for websockets equals 2.

    This is then used as a timeout starting when we open the websocket connection, until we receive the OPENED frame (Stomp CONNECTED frame). When this is timing out, SockJS is simply closing the protocol socket and trying with the next one. If there are no protocols left, it is going to retry all over again.

    The whole logic sounds very data based and speculative, based on some experiments done by the team. Why 300ms? why roundtrip of websocket is 2?. And what is more surprising is that one can't really override this value in either client or server side (Server could override this value with the response in the /info call)

    This effectively means, that if your Stomp server is too slow creating a queue, clients will be lost in negotiation.

    Lets assume your client is not far distant from servers, then it most probably gonna fallback into the < 100ms RTT. Therefore, the client has to receive the CONNECTED frame in between 700-800ms.

    I can't find another way to work around this, other than slowing down with a filter the processing of the /info call by a certain amount calculated empirically from measuring an average of our lag to our broker relay.