Search code examples
activemq-artemisfailover

ActiveMQ Artemis Shared Storage slave fail to start if master is not started


We have a master/slave setup with the shared storage strategy. We observed that if we start the slave when the master is down, we have the following message:

AMQ221032: Waiting to become backup node

And the server does not become live.

So it means that the slave requires the master to be up at a given time to become operational. Is this the expected behavior? Is there a way to let the slave become live at startup if the master is down?


Solution

  • Generally speaking what you're seeing is not the expected behavior for master/slave using shared storage. If the master is not started and the slave is started then the slave should acquire the lock on the shared storage and start. I just tested this out using the transaction-failover example which is shipped with ActiveMQ Artemis and the backup started just fine when the master wasn't started. Here's the logging I saw when starting the backup when the master wasn't started:

    2022-07-03 21:50:55,955 INFO  [org.apache.activemq.artemis.core.server] AMQ221032: Waiting to become backup node
    2022-07-03 21:50:55,956 INFO  [org.apache.activemq.artemis.core.server] AMQ221033: ** got backup lock
    ...
    2022-07-03 21:50:56,156 INFO  [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.23.0 [0db7f4ea-fb44-11ec-8718-3ce1a1d12939] started, waiting live to fail before it gets active
    ...
    2022-07-03 21:50:56,661 INFO  [org.apache.activemq.artemis.core.server] AMQ221010: Backup Server is now live
    

    The behavior you're seeing indicates that perhaps another backup is already started and has acquired the backup lock on the journal. It's hard to say with the information you're provided.