Search code examples
rabbitmqhaproxy

Extra TCP connections on the RabbitMQ server after resource alarm


I have RabbitMQ Server 3.6.0 installed on Windows (I know it's time to upgrade, I've already done that on the other server node).

Heartbeats are enabled on both server and client side (heartbeat interval 60s).

I have had a resource alarm (RAM limit), and after that I have observed the raise of amount of TCP connections to RMQ Server.

At the moment there're 18000 connections while normal amount is 6000.

Via management plugin I can see there is a lot of connections with 0 channels, while our "normal" connection have at least 1 channel.

And even RMQ Server restart won't help: all connections would re-establish.

   1. Does that mean all of them are really alive?

Similar issue was described here https://github.com/rabbitmq/rabbitmq-server/issues/384, but as I can see it was fixed exactly in v3.6.0.

   2. Do I understand right that before RMQ Server v3.6.0 the behavior after resource alarm was like that: several TCP connections could hang on server side per 1 real client autorecovery connection?

Maybe important: we have haProxy between the server and the clients. 

   3. Could haProxy be an explanation for this extra connections? Maybe it prevents client from receiving a signal the connection was closed due to resource alarm?


Solution

  • I've managed to reproduce the problem: in the end it was a bug in the way our client used RMQ connections. It created 1 auto-recovery connection (that's all fine with that) and sometimes it created a separate simple connection for "temporary" purposes.

    Step to reproduce my problem were:

    1. Reach memory alarm in RabbitMQ (e.g. set up an easily reached RAM limit and push a lot of big messages). Connections would be in state "blocking".
    2. Start sending message from our client with this new "temp" connection.
    3. Ensure the connection is in state "blocked".
    4. Without eliminating resource alarm, restart RabbitMQ node.
    5. The "temp" connection itself was here! Despite the fact auto-recovery was not enabled for it. And it continued sending heartbeats so the server didn't close it.

    We will fix the client to use one and the only connection always. Plus we of course will upgrade Erlang.