EAP 7 JMS cluster not in sync. Scheduled JMS message blocked when node is down

I'm setting up an EAP 7 cluster in standalone mode. I followed this tutorial and set up my cluster.

Then I started testing the JMS system with a simple JMS app. Each time I send a JMS message, I observe that JMS message count updated in only one of the nodes (instead of both node shown in the video). The total number of messages sent is equal to the sum of counts from both nodes.

However, because the nodes are clustered, I would expect the JMS statistics to be in sync (which is shown in the video), therefore both nodes should display the total number of messages received in the cluster instead only part of them.

Also, when sending a scheduled message, if the node holds the message dies, then the message is blocked until the dead node is restarted. This is definitely unacceptable as I would expect the scheduled message gets delivered by the other (running) node.

All tests are performed using the default standalone-full-ha.xml

Here are all the steps to reproduce the issue:

Environment Setup

Download eap7.1/7.2 or wildfly12/14 and unzip to a directory
rename the directory to my-dir-node1
copy your-dir-node1 to my-dir-node2
Update the configuration
1. go to my-dir-node1/standalone and copy standalone-full-ha.xml to standalone-full-ha-test.xml
2. edit my-dir-node1/standalone/standalone-full-ha-test.xml
3. add name="node1" to the root element: <server xmlns="urn:jboss:domain:5.0" name="node1">
4. search for <cluster password="${jboss.messaging.cluster.password:CHANGE ME!!}"/> and replace it with <cluster password="${jboss.messaging.cluster.password:mypassword}"/>
5. add <jms-queue name="JMSTest" entries="java:/jms/queue/test"/> after <jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>
6. go to my-dir-node2/standalone and repeat the above steps. make sure you name it server "node2" instead of "node1"

Deploy the test app by copying test-jms.war to my-dir-node1/standalone/deploy and your-dir-node2/standalone/deploy

content of my test app

<%@ page import="javax.naming.InitialContext" %>
<%@ page import="javax.jms.*" %>
<%@ page import="java.util.logging.Logger" %>
<%@ page contentType="text/html;charset=UTF-8" language="java" %>

<%

    Logger logger = Logger.getLogger("JMSSender");
    InitialContext initialContext = new InitialContext();
    ConnectionFactory factory = (ConnectionFactory) initialContext.lookup("ConnectionFactory");
    Destination destination = (Destination)initialContext.lookup("java:/jms/queue/test");
    Connection connection = factory.createConnection();
    Session session1 = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
    MessageProducer messageProducer = session1.createProducer(destination);

    String body = request.getParameter("message");

    if (body == null)
        body = "Hello World!";

    TextMessage message = session1.createTextMessage(body);

    String delay = request.getParameter("delay");

    if (delay != null)
        message.setJMSDeliveryTime(System.currentTimeMillis() + Integer.parseInt(delay));

    messageProducer.send(message);

    logger.info("Send message: " + body);
%>
<html>
  <head>
    <title>Test JMS Sender</title>
  </head>
  <body>
  <h1>Message</h1>
  <p><strong><%=body%></strong></p>
  <p>Add ?message=xxx to the url to change the message.</p>
  <p>Add ?delay=xxx to the url to schedule a delivery at a later time. The unit of delay is in millisecond. ie: 1 second = 1000 </p>
  </body>
</html>

JMS receiver:

import org.apache.log4j.Logger;

import javax.ejb.ActivationConfigProperty;
import javax.ejb.MessageDriven;
import javax.jms.JMSException;
import javax.jms.Message;
import javax.jms.MessageListener;
import javax.jms.TextMessage;

@MessageDriven(mappedName = "testQueue", activationConfig =  {
        @ActivationConfigProperty(propertyName = "acknowledgeMode", propertyValue = "Auto-acknowledge")
        , @ActivationConfigProperty(propertyName = "destinationType", propertyValue = "javax.jms.Queue")
        , @ActivationConfigProperty(propertyName = "destination", propertyValue = "java:/jms/queue/test")
})
public class JMSReceiver implements MessageListener {

    // Logger for the class
    private static Logger logger = Logger.getLogger(JMSReceiver.class.getName());

    @Override
    public void onMessage(Message message) {
        TextMessage t = (TextMessage) message;
        try {
            logger.info(t.getText());
        } catch (JMSException e) {
            logger.info(e.getMessage());
        }
    }
}

web.xml

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_4_0.xsd"
         version="4.0">
    <welcome-file-list>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>
</web-app>

Solution

Your understanding of a messaging cluster in EAP (and the video which you linked) is incorrect. If you send 1 message to a messaging cluster in EAP the only 1 node in the cluster has that message. The messages are not replicated between all nodes in the cluster. The JMS statistics for each node in the cluster will not necessarily be in sync.

What you are seeing is, in fact, the expected behavior. Furthermore, it is what is demonstrated in the video you linked. In the video, the client application sends 2 messages each time it is run. One message goes to one cluster node and the second message goes to the other cluster node. That is why the "Messages Added" metric on each node increases and appears to be in sync. The "Messages Added" metric on each node increases by 1 when 2 messages are sent (1+1=2). The total number of messages added to the queues across the cluster can be determined by summing the "Messages Added" from every node in the cluster.

This behavior is important to understand because it means that if a node in the cluster goes down then all the messages on that node become unavailable (as you observed). If you want messages to be available in the case of a node failure then you need to configure a live/backup pair. Consult the EAP documentation for more details on how to accomplish this.