Search code examples
javajmsactivemq-classicfailover

ActiveMQ - Applications unable to connect for a minute on localhost. Eventual reconnect using failover


I'm running an ActiveMQ broker (version 5.15.12 and later downgraded to 5.15.8) on a windows server on which several local applications are running that are connecting. During heavier than average load on ActiveMQ, it happens regularly that the initial connection to ActiveMQ cannot be made by the applications (all (10) application seem to be affected evenly). On average, the web client of ActiveMQ show around 1500 connections and the broker is servicing 91 queues and 0 topics, with some of the queues processing ~100 events a second while other queues barely have any traffic.

The applications attempt to reconnect using the failover mechanism (default configuration). This can sometimes take up to 40 seconds or longer (after the initial reconnections fail, the exponential back-off causes retries to happens after 10, 20 and 40 seconds).

The applications make use of PooledConnectionFactory with a poolsize of 300 (upped to 600 on one of the applications as a test without impact)

This is the URI of the broker: <transportConnector name="openwire" uri="tcp://0.0.0.0:61616?maximumConnections=8000&amp;wireFormat.maxFrameSize=104857600"/>

Memory: wrapper.java.maxmemory=8192

Thinking it was a resource issue, the maxConnections has been upped to 12000 and advisory support disabled both without change.

While running ActiveMQ with debug logging enabled, I noticed that in the span of 500ms ActiveMQ closed 242 connections with the following logging:

2022-05-01 07:57:16,599 | DEBUG | Unregistering MBean org.apache.activemq:type=Broker,brokerName=localhost,destinationType=Topic,destinationName=ActiveMQ.Advisory.TempQueue_ActiveMQ.Advisory.TempTopic,endpoint=Consumer,clientId=ID_HIT500SRV201-59070-1650984489617-0_624222,consumerId=ID_HIT500SRV201-59070-1650984489617-1_624223_-1_1 | org.apache.activemq.broker.jmx.ManagementContext | ActiveMQ Transport: tcp:///127.0.0.1:62259@61616
2022-05-01 07:57:16,599 | DEBUG | localhost removing consumer: ID:HIT500SRV201-59070-1650984489617-1:624223:-1:1 for destination: ActiveMQ.Advisory.TempQueue,ActiveMQ.Advisory.TempTopic | org.apache.activemq.broker.region.AbstractRegion | ActiveMQ Transport: tcp:///127.0.0.1:62259@61616
2022-05-01 07:57:16,599 | DEBUG | remove connection id: ID:HIT500SRV201-59070-1650984489617-1:624223 | org.apache.activemq.broker.TransportConnection | ActiveMQ Transport: tcp:///127.0.0.1:62259@61616
2022-05-01 07:57:16,599 | DEBUG | Publishing: tcp://HIT500SRV201:61616 for broker transport URI: tcp://HIT500SRV201:61616?maximumConnections=12000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.broker.TransportConnector | ActiveMQ Transport: tcp:///127.0.0.1:62259@61616
2022-05-01 07:57:16,599 | DEBUG | Publishing: tcp://HIT500SRV201:61616 for broker transport URI: tcp://HIT500SRV201:61616?maximumConnections=12000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.broker.TransportConnector | ActiveMQ Transport: tcp:///127.0.0.1:62259@61616
2022-05-01 07:57:16,599 | DEBUG | Unregistering MBean org.apache.activemq:type=Broker,brokerName=localhost,connector=clientConnectors,connectorName=openwire,connectionViewType=clientId,connectionName=ID_HIT500SRV201-59070-1650984489617-0_624222 | org.apache.activemq.broker.jmx.ManagementContext | ActiveMQ Transport: tcp:///127.0.0.1:62259@61616
2022-05-01 07:57:16,599 | DEBUG | Unregistering MBean org.apache.activemq:type=Broker,brokerName=localhost,connector=clientConnectors,connectorName=openwire,connectionViewType=remoteAddress,connectionName=tcp_//127.0.0.1_62259 | org.apache.activemq.broker.jmx.ManagementContext | ActiveMQ Transport: tcp:///127.0.0.1:62259@61616
2022-05-01 07:57:16,599 | DEBUG | Unregistering MBean org.apache.activemq:type=Broker,brokerName=localhost,connector=clientConnectors,connectorName=openwire,connectionViewType=remoteAddress,connectionName=tcp_//127.0.0.1_62259 | org.apache.activemq.broker.jmx.ManagementContext | ActiveMQ Transport: tcp:///127.0.0.1:62259@61616
2022-05-01 07:57:16,599 | DEBUG | Stopping connection: tcp://127.0.0.1:62259 | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[localhost] Task-4925514
2022-05-01 07:57:16,599 | DEBUG | Stopping transport tcp:///127.0.0.1:62259@61616 | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ BrokerService[localhost] Task-4925514
2022-05-01 07:57:16,614 | DEBUG | Closed socket Socket[addr=/127.0.0.1,port=62259,localport=61616] | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ Task-1
2022-05-01 07:57:16,614 | DEBUG | Stopped transport: tcp://127.0.0.1:62259 | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[localhost] Task-4925514
2022-05-01 07:57:16,614 | DEBUG | Connection Stopped: tcp://127.0.0.1:62259 | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[localhost] Task-4925514

During and after this time, new connections are started. After this, the applications are able to reconnect again. I'm not even sure if this might just be expected behaviour or if it's related to the problem.

Any insights would be appreciated, thanks!


Solution

  • The combinations of our applications, a database and activemq had exhausted the default dynamic port range. See https://learn.microsoft.com/en-us/windows/client-management/troubleshoot-tcpip-port-exhaust