Search code examples
javaspringactivemq-artemis

ActiveMQ Artemis is not responsive after sending many messages


I'm using Apache ActiveMQ Artemis 2.29.0. The broker seem to work fine when I send about 5-10 messages per second. However, when I send more messages to the queue using Spring Boot's JmsTemplate (30,000 messages in parallel) the broker hangs and is not responsive to a CLI command.

I ran artemis queue stat give result although server still running (port 61616 is found using netstat):

Connection failed::Failed to create session factory

The Artemis log does not give any useful reason:

2023-12-15 10:28:32,635 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:HHT9104APP01-23502-1702610838169-1:43:1
2023-12-15 10:28:32,635 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:HHT9104APP01-23502-1702610838169-1:43:1
2023-12-15 10:28:32,635 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:HHT9104APP01-23502-1702610838169-1:43:2
2023-12-15 10:28:32,635 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:HHT9104APP01-23502-1702610838169-1:43:2
2023-12-15 10:28:32,665 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session dc2ee8cc-9af9-11ee-b2fc-065c8c797d3f
2023-12-15 10:28:32,665 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session dc2ee8cc-9af9-11ee-b2fc-065c8c797d3f
2023-12-15 10:28:32,666 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:HHT9104APP01-23502-1702610838169-1:45:-1
2023-12-15 10:28:32,666 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:HHT9104APP01-23502-1702610838169-1:45:-1
2023-12-15 10:28:32,666 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:HHT9104APP01-23502-1702610838169-1:45:1
2023-12-15 10:28:32,666 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:HHT9104APP01-23502-1702610838169-1:45:1
2023-12-15 10:28:32,666 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:HHT9104APP01-23502-1702610838169-1:45:2
2023-12-15 10:28:32,666 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:HHT9104APP01-23502-1702610838169-1:45:2
2023-12-15 10:28:32,667 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session dc2f0fe0-9af9-11ee-b2fc-065c8c797d3f
2023-12-15 10:28:32,667 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session dc2f0fe0-9af9-11ee-b2fc-065c8c797d3f
2023-12-15 10:28:32,668 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:HHT9104APP01-23502-1702610838169-1:47:-1
2023-12-15 10:28:32,668 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:HHT9104APP01-23502-1702610838169-1:47:-1
2023-12-15 10:28:32,668 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:HHT9104APP01-23502-1702610838169-1:47:1
2023-12-15 10:28:32,668 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:HHT9104APP01-23502-1702610838169-1:47:1
2023-12-15 10:28:32,668 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:HHT9104APP01-23502-1702610838169-1:47:2
2023-12-15 10:28:32,668 WARN  [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:HHT9104APP01-23502-1702610838169-1:47:2
2023-12-15 10:28:32,672 WARN  [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session dc309684-9af9-11ee-b2fc-065c8c797d3f

This is my broker.xml setting

<?xml version='1.0'?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->

<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xi="http://www.w3.org/2001/XInclude"
               xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">

   <core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="urn:activemq:core ">

      <name>10.0.222.2</name>


      <persistence-enabled>true</persistence-enabled>

      <!-- It is recommended to keep this value as 1, maximizing the number of records stored about redeliveries.
           However if you must preserve state of individual redeliveries, you may increase this value or set it to -1 (infinite). -->
      <max-redelivery-records>1</max-redelivery-records>

      <!-- this could be ASYNCIO, MAPPED, NIO
           ASYNCIO: Linux Libaio
           MAPPED: mmap files
           NIO: Plain Java Files
       -->
      <journal-type>ASYNCIO</journal-type>

      <paging-directory>data/paging</paging-directory>

      <bindings-directory>data/bindings</bindings-directory>

      <journal-directory>data/journal</journal-directory>

      <large-messages-directory>data/large-messages</large-messages-directory>


      <!-- if you want to retain your journal uncomment this following configuration.

      This will allow your system to keep 7 days of your data, up to 10G. Tweak it accordingly to your use case and capacity.

      it is recommended to use a separate storage unit from the journal for performance considerations.

      <journal-retention-directory period="7" unit="DAYS" storage-limit="10G">data/retention</journal-retention-directory>

      You can also enable retention by using the argument journal-retention on the `artemis create` command -->



      <journal-datasync>true</journal-datasync>

      <journal-min-files>2</journal-min-files>

      <journal-pool-files>10</journal-pool-files>

      <journal-device-block-size>4096</journal-device-block-size>

      <journal-file-size>10M</journal-file-size>

      <!--
       This value was determined through a calculation.
       Your system could perform 125 writes per millisecond
       on the current journal configuration.
       That translates as a sync write every 8000 nanoseconds.

       Note: If you specify 0 the system will perform writes directly to the disk.
             We recommend this to be 0 if you are using journalType=MAPPED and journal-datasync=false.
      -->
      <journal-buffer-timeout>8000</journal-buffer-timeout>


      <!--
        When using ASYNCIO, this will determine the writing queue depth for libaio.
       -->
      <journal-max-io>4096</journal-max-io>
      <!--
        You can verify the network health of a particular NIC by specifying the <network-check-NIC> element.
         <network-check-NIC>theNicName</network-check-NIC>
        -->

      <!--
        Use this to use an HTTP server to validate the network
         <network-check-URL-list>http://www.apache.org</network-check-URL-list> -->

      <!-- <network-check-period>10000</network-check-period> -->
      <!-- <network-check-timeout>1000</network-check-timeout> -->

      <!-- this is a comma separated list, no spaces, just DNS or IPs
           it should accept IPV6

           Warning: Make sure you understand your network topology as this is meant to validate if your network is valid.
                    Using IPs that could eventually disappear or be partially visible may defeat the purpose.
                    You can use a list of multiple IPs, and if any successful ping will make the server OK to continue running -->
      <!-- <network-check-list>10.0.0.1</network-check-list> -->

      <!-- use this to customize the ping used for ipv4 addresses -->
      <!-- <network-check-ping-command>ping -c 1 -t %d %s</network-check-ping-command> -->

      <!-- use this to customize the ping used for ipv6 addresses -->
      <!-- <network-check-ping6-command>ping6 -c 1 %2$s</network-check-ping6-command> -->



    <connectors>
        <!-- Connector used to be announced through cluster connections and notifications -->
        <connector name="artemis">tcp://10.0.222.2:61616</connector>
        <connector name="netty-connector">tcp://10.0.222.2:61619</connector>
    </connectors>



      <!-- how often we are looking for how many bytes are being used on the disk in ms -->
      <disk-scan-period>5000</disk-scan-period>

      <!-- once the disk hits this limit the system will block, or close the connection in certain protocols
           that won't support flow control. -->
      <max-disk-usage>90</max-disk-usage>

      <!-- should the broker detect dead locks and other issues -->
      <critical-analyzer>true</critical-analyzer>

      <critical-analyzer-timeout>120000</critical-analyzer-timeout>

      <critical-analyzer-check-period>60000</critical-analyzer-check-period>

      <critical-analyzer-policy>LOG</critical-analyzer-policy>


      <page-sync-timeout>20000</page-sync-timeout>


      <!-- the system will enter into page mode once you hit this limit. This is an estimate in bytes of how much the messages are using in memory

      The system will use half of the available memory (-Xmx) by default for the global-max-size.
      You may specify a different value here if you need to customize it to your needs.

      <global-max-size>100Mb</global-max-size> -->

      <!-- the maximum number of messages accepted before entering full address mode.
           if global-max-size is specified the full address mode will be specified by whatever hits it first. -->
      <global-max-messages>-1</global-max-messages>

      <acceptors>

         <!-- useEpoll means: it will use Netty epoll if you are on a system (Linux) that supports it -->
         <!-- amqpCredits: The number of credits sent to AMQP producers -->
         <!-- amqpLowCredits: The server will send the # credits specified at amqpCredits at this low mark -->
         <!-- amqpDuplicateDetection: If you are not using duplicate detection, set this to false
                                      as duplicate detection requires applicationProperties to be parsed on the server. -->
         <!-- amqpMinLargeMessageSize: Determines how many bytes are considered large, so we start using files to hold their data.
                                       default: 102400, -1 would mean to disable large mesasge control -->

         <!-- Note: If an acceptor needs to be compatible with HornetQ and/or Artemis 1.x clients add
                    "anycastPrefix=jms.queue.;multicastPrefix=jms.topic." to the acceptor url.
                    See https://issues.apache.org/jira/browse/ARTEMIS-1644 for more information. -->


         <!-- Acceptor for every supported protocol -->
         <acceptor name="artemis">tcp://10.0.222.2:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;supportAdvisory=false;suppressInternalManagementObjects=false</acceptor>

         <!-- MQTT Acceptor -->
         <acceptor name="netty-acceptor">tcp://10.0.222.2:61619</acceptor>

      </acceptors>


      <cluster-user>admin</cluster-user>

      <cluster-password>*******</cluster-password>

      <broadcast-groups>
         <broadcast-group name="bg-group1">
            <group-address>231.7.7.7</group-address>
            <group-port>9876</group-port>
            <broadcast-period>5000</broadcast-period>
            <connector-ref>artemis</connector-ref>
         </broadcast-group>
      </broadcast-groups>

      <discovery-groups>
         <discovery-group name="dg-group1">
            <group-address>231.7.7.7</group-address>
            <group-port>9876</group-port>
            <refresh-timeout>10000</refresh-timeout>
         </discovery-group>
      </discovery-groups>

      <cluster-connections>
         <cluster-connection name="my-cluster">
            <connector-ref>artemis</connector-ref>
            <check-period>1000</check-period>
            <connection-ttl>5000</connection-ttl>
            <min-large-message-size>50000</min-large-message-size>
            <call-timeout>5000</call-timeout>
            <retry-interval>500</retry-interval>
            <retry-interval-multiplier>1.0</retry-interval-multiplier>
            <max-retry-interval>5000</max-retry-interval>
            <initial-connect-attempts>-1</initial-connect-attempts>
            <reconnect-attempts>-1</reconnect-attempts>
            <use-duplicate-detection>true</use-duplicate-detection>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>1</max-hops>
            <confirmation-window-size>32000</confirmation-window-size>
            <call-failover-timeout>30000</call-failover-timeout>
            <notification-interval>1000</notification-interval>
            <notification-attempts>2</notification-attempts>
            <discovery-group-ref discovery-group-name="dg-group1"/>
         </cluster-connection>
      </cluster-connections>


      <ha-policy>
         <replication>
            <master>
                <check-for-live-server>true</check-for-live-server>
            </master>
         </replication>
      </ha-policy>

      <security-settings>
         <security-setting match="#">
            <permission type="createNonDurableQueue" roles="amq"/>
            <permission type="deleteNonDurableQueue" roles="amq"/>
            <permission type="createDurableQueue" roles="amq"/>
            <permission type="deleteDurableQueue" roles="amq"/>
            <permission type="createAddress" roles="amq"/>
            <permission type="deleteAddress" roles="amq"/>
            <permission type="consume" roles="amq"/>
            <permission type="browse" roles="amq"/>
            <permission type="send" roles="amq"/>
            <!-- we need this otherwise ./artemis data imp wouldn't work -->
            <permission type="manage" roles="amq"/>
         </security-setting>
      </security-settings>

      <address-settings>
         <!-- if you define auto-create on certain queues, management has to be auto-create -->
         <address-setting match="activemq.management#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <redelivery-delay>0</redelivery-delay>
            <!-- with -1 only the global-max-size is in use for limiting -->
            <max-size-bytes>-1</max-size-bytes>
            <message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
         </address-setting>
         <!--default for catch all-->
         <address-setting match="#">
            <dead-letter-address>DLQ</dead-letter-address>
            <expiry-address>ExpiryQueue</expiry-address>
            <expiry-delay>120000</expiry-delay>
            <redelivery-delay>0</redelivery-delay>

            <message-counter-history-day-limit>10</message-counter-history-day-limit>
            <address-full-policy>PAGE</address-full-policy>
            <auto-create-queues>true</auto-create-queues>
            <auto-create-addresses>true</auto-create-addresses>
            <auto-delete-queues>false</auto-delete-queues>
            <auto-delete-addresses>false</auto-delete-addresses>

            <!-- The size of each page file -->
            <page-size-bytes>10M</page-size-bytes>

            <!-- When we start applying the address-full-policy, e.g paging -->
            <!-- Both are disabled by default, which means we will use the global-max-size/global-max-messages  -->
            <max-size-bytes>-1</max-size-bytes>
            <max-size-messages>-1</max-size-messages>

            <!-- When we read from paging into queues (memory) -->

            <max-read-page-messages>-1</max-read-page-messages>
            <max-read-page-bytes>20M</max-read-page-bytes>

            <!-- Limit on paging capacity before starting to throw errors -->

            <page-limit-bytes>-1</page-limit-bytes>
            <page-limit-messages>-1</page-limit-messages>
          </address-setting>
      </address-settings>

      <addresses>
         <address name="DLQ">
            <anycast>
               <queue name="DLQ" />
            </anycast>
         </address>
         <address name="ExpiryQueue">
            <anycast>
               <queue name="ExpiryQueue" />
            </anycast>
         </address>

      </addresses>


      <!-- Uncomment the following if you want to use the Standard LoggingActiveMQServerPlugin pluging to log in events
      <broker-plugins>
         <broker-plugin class-name="org.apache.activemq.artemis.core.server.plugin.impl.LoggingActiveMQServerPlugin">
            <property key="LOG_ALL_EVENTS" value="true"/>
            <property key="LOG_CONNECTION_EVENTS" value="true"/>
            <property key="LOG_SESSION_EVENTS" value="true"/>
            <property key="LOG_CONSUMER_EVENTS" value="true"/>
            <property key="LOG_DELIVERING_EVENTS" value="true"/>
            <property key="LOG_SENDING_EVENTS" value="true"/>
            <property key="LOG_INTERNAL_EVENTS" value="true"/>
         </broker-plugin>
      </broker-plugins>
      -->

   </core>
</configuration>

And this is the code how I send and receive messages between clients

boolean isSendSuccessful = ActiveMQRouterService.sendAMQ(serialNumber, webSocketUser, ApplicationConstants.CPE_TOPIC, rpc, null, jmsTemplate);
if (isSendSuccessful) {
    JMSEnvelope envelope = (JMSEnvelope) jmsTemplate.receiveSelectedAndConvert(ApplicationConstants.API_TOPIC,
            "(SerialNumber='" + serialNumber + "') AND (WebSocketUser='" + webSocketUser + "')");
JMSEnvelop envelop = (JMSEnvelop) jmsTemplate.receiveSelectedAndConvert("CPE", "SerialNumber = '" + serialNumber + "'");

...

jmsTemplate.convertAndSend("API", envelop, message -> {
   message.setStringProperty("SerialNumber", serialNumber);
   message.setStringProperty("WebSocketUser", webSocketUser);
   message.setJMSExpiration(30000);
});

Which ActiveMQRouterService.sendAMQ calls jmsTemplate.convertAndSend.

Things seem working fine with cli commands though ./artemis perf client

Connection brokerURL = tcp://10.0.222.4:61616
2023-12-19 15:44:59,294 WARN  [org.apache.activemq.artemis.core.client] AMQ212053: CompletionListener/SendAcknowledgementHandler used with confirmationWindowSize=-1. Enable confirmationWindowSize to receive acks from server!

--- warmup false
--- sent:         42868 msg/sec
--- blocked:      42879 msg/sec
--- completed:    42883 msg/sec
--- received:     42885 msg/sec

--- warmup false
--- sent:         53837 msg/sec
--- blocked:      53837 msg/sec
--- completed:    53837 msg/sec
--- received:     53838 msg/sec

--- warmup false
--- sent:         48468 msg/sec
--- blocked:      48467 msg/sec
--- completed:    48468 msg/sec
--- received:     48468 msg/sec

--- warmup false
--- sent:         46742 msg/sec
--- blocked:      46743 msg/sec
--- completed:    46742 msg/sec
--- received:     46743 msg/sec

--- warmup false
--- sent:         53762 msg/sec
--- blocked:      53762 msg/sec
--- completed:    53763 msg/sec
--- received:     53763 msg/sec

--- warmup false
--- sent:         52831 msg/sec
--- blocked:      52831 msg/sec
--- completed:    52831 msg/sec
--- received:     52803 msg/sec

--- warmup false
--- sent:         52267 msg/sec
--- blocked:      52267 msg/sec
--- completed:    52267 msg/sec
--- received:     52294 msg/sec

--- SUMMARY
--- result:              success
--- total sent:           361181
--- total blocked:        361180
--- total completed:      361181
--- total received:       361181
--- aggregated send time:       mean:     14.63 us - 50.00%:     13.00 us - 90.00%:     19.00 us - 99.00%:     33.00 us - 99.90%:    263.00 us - 99.99%:   1759.00 us - max:     27647.00 us
--- aggregated transfer time:   mean:    127.24 us - 50.00%:     66.00 us - 90.00%:    109.00 us - 99.00%:   1599.00 us - 99.90%:   5375.00 us - 99.99%:   9471.00 us - max:     26495.00 us

Computer resources should not be problem as there is plenty RAM/CPU/Hard Disk space leftover.

Any help will be appreciated. Sorry if my English is hard to understand. It is not my first language.


Solution

  • Turn out we was using org.apache.activemq.ActiveMQConnectionFactory with ActiveMQ Classic and we didn't update ConnectionFactory when we migrate from ActiveMQ Classic to ActiveMQ Artemis as we thought the library was fully compatible.

    Update to org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory and the problem is solved