Search code examples
javawebsocketpush-diffusion

Disconnecting a single client disconnects many other clients


I’m testing a Diffusion solution in our pre-production environment. The solution gives anonymous clients 10 minutes of free access before they have to authenticate, or be disconnected. This works fine in development and early testing, but in pre-production when one client is disconnected we see many simultaneous disconnection of other clients without cause. Once the logging is set to FINEST the log file says:

2016-03-21 11:57:36.557|DEBUG|Diffusion: InboundThreadPool Thread_4||NIOBufferedChannel@52e2a219[connected local=/10.0.4.1:8080 remote=/10.0.1.99:58673] : Closed(UNEXPECTED_ERROR) Unexpected error EOF|com.pushtechnology.diffusion.io.message.MessageChannelException
2016-03-21 11:57:36.558|DEBUG|Diffusion: InboundThreadPool Thread_4||Java Client 50328FF242799CD4-000000000000015A AWAITING_RECONNECTION@10.0.1.99: State changed from CONNECTED to AWAITING_RECONNECTION.|com.pushtechnology.diffusion.clients.impl.ClientImpl
2016-03-21 11:57:36.558|DEBUG|Diffusion: InboundThreadPool Thread_4||Java Client 50328FF242799CD4-000000000000015A AWAITING_RECONNECTION@10.0.1.99: CONNECTION_LOST keeping alive for 60000 ms.|com.pushtechnology.diffusion.clients.impl.ClientImpl

The effected clients are always browsers, not smart phones. Often older browsers such as IE9.


Solution

  • I'm guessing that your pre-production environment has a load balancer which is set to use connection pooling. Versions of IE prior to v10 did not support WebSockets, so they'll be using XHR long polling. Your smart phone client also will be using WebSockets, so will be unaffected.

    The manual has this to say in section "Considerations when using load balancers"

    Do not use connection pooling for connections between the load balancer and the Diffusion server. If multiple client connections are multiplexed through a single server-side connection, this can cause client connections to be prematurely closed.

    In Diffusion, a client is associated with a single TCP/HTTP connection for the lifetime of that connection. If a Diffusion server closes a client, the connection is also closed. Diffusion makes no distinction between a single client connection and a multiplexed connection, so when a client sharing a multiplexed connection closes, the connection between the load balancer and Diffusion is closed, and subsequently all of the client-side connections multiplexed through that server-side connection are closed.

    To illustrate the problem. When a Diffusions server has a direct connection with its audience Alice, Bob and Charlie, closing Bob's connection is straight forward

    Diffusion and audience with no load balancer

    When a connection pooling middle box (a proxy or load-balancer) enters the mix, closing Bob's connection results in disconnection for Alice and Charlie as well.

    Diffusion and audience with connection-pooling middlebox

    So, whereas connection pooling is a good idea for regular HTTP servers, it is problematic for Diffusion servers entertaining an audience of XHR polling clients if it needs to disconnect discrete clients.