Search code examples
mqttmosquittomqtt-vernemq

MQTT bridge with Sparkplug B -> NDEATH scenarios not working as expected


I have two machines and am testing MQTT bridge connections with Sparkplug B payloads.

I have the bridges working as expected but when I simulate some failure points as annotated in the image below, things are not working as expected. My expectation is an NDEATH will be visible on the broker on Machine B when any of the three points in the image disconnect.

When I kill the publisher or the local MQTT Broker on Machine A, I do indeed see the NDEATH as expected when subscribed to the Machine B MQTT Broker, but when I pull the plug between Machine A & B as noted by #3 in the image, I do not see a NDEATH! I have waited for a long period to make sure the 60 second keep alive has had plenty of time to react which I understand to be 1.5x the keep alive typically.

The publisher and Broker on Machine A continue to operate and when the connection at point #3 is brought back online, all messages are delivered, but I was expecting with the bridge connection down, any nodes that had published a last will & testament (LWT) would see an NDEATH due to the connection loss at any point.

I have tested with mosquitto, vernemq and a little with hive-ce, however hive-ce is severely limited in functionality. Am I missing something with my understanding of MQTT bridging? Shouldn't NDEATH be sent in all three scenarios?

enter image description here


Solution

  • From the sparkplug spec:

    A critical aspect for MQTT in a real-time SCADA/IIoT application is making sure that the primary MQTT SCADA/IIoT Host Node can know the “STATE” of any EoN node in the infrastructure within the MQTT Keep Alive period (refer to section 3.1.2.10 in the MQTT Specification). To implement the state a known Will Topic and Will Message is defined and specified. The Will Topic and Will Message registered in the MQTT CONNECT session establishment, collectively make up what we are calling the Death Certificate. Note that the delivery of the Death Certificate upon any MQTT client going offline unexpectedly is part of the MQTT protocol specification, not part of this Sparkplug™ specification (refer to section 3.1 CONNECT in the MQTT Specification for further details on how an MQTT Session is established and maintained).

    So, in MQTT terms, NDEATH is a 'Will' which, as mentioned above, is defined in section 3.1 of the the MQTT spec:

    If the Will Flag is set to 1 this indicates that, if the Connect request is accepted, a Will Message MUST be stored on the Server and associated with the Network Connection. The Will Message MUST be published when the Network Connection is subsequently closed unless the Will Message has been deleted by the Server on receipt of a DISCONNECT Packet

    In summary NDEATH creates a 'Will' which the MQTT broker publishes if it looses the connection with the publisher (unless a DISCONNECT is received first).

    When you establish a bridge this relays messages published on whatever topic(s) the bridge is configured to relay. The bridge only communicates published messages; not information about what clients are connected (or any 'Will' they may have set) so when the bridged connection goes down subscribers will not receive the NDEATH.

    Monitoring the connection status of bridges is not something covered by the spec so options vary from broker to broker. For example Mosquitto can (using a 'Will' on the bridge connection) provide a notification when the connection goes down (see notifications in mosquitto.conf). This may provide you with some options to get the information you need.