Search code examples
architecturemicroservicesdistributed-systemsaga

Do I need a Saga to ensure a message is always processed?


When my Sales microservice completes a sale, my Order microservice must create an order. A SaleCompleteEvent is raised onto the bus for the Order service to consume when this happens.

However, the Sales service does not need to wait for the Order service, and in some scenarios even if there was a noticeable delay in the order being created, this wouldn't be such a problem.

So having a Saga which rolls back the sale if the order fails to create seems overkill here. Rather, I just want the Order service to keep trying and to ensure that no message is lost on the bus (even if it goes down in catastrophe).

Retrying is easy, but I'm trying to research how I would go about making sure a bus message is never lost. If I focus my efforts on this aspect (where something must happen, but it doesn't need to roll back previous steps like a transaction), would we still call this a Saga pattern? Or does it have another name I can use for research?

Is this 'rely on the bus to always recover the message in failure' approach even feasible in practice, or do I always need something to watch and raise new SaleCompleteEvents if something goes wrong?


Solution

  • I don't see the need for a saga for your use case. If the failure of creating an order is caused by temporary problems (e.g. a network glitch), then retrying is a natural strategy. I'll now try to answer the questions in the body, but I need to interpret them a bit:

    Retrying is easy, but I'm trying to research how I would go about making sure a bus message is never lost

    I guess your need is that a message doesn't get "forgotten" in case its processing fails for some reason. Message buses usually offer the feature of Acknowledgment (that's your keyword) to achieve that: a bus consumer receives a message, processes it and in case of success it acknowledges the message, so the bus know that the message can be discarded. In case of failure/crash of the consumer the acknowledge signal is never sent, with the consequence that the bus, after some timeout, will make the message available to be picked up again.

    Is this 'rely on the bus to always recover the message in failure' approach even feasible in practice, or do I always need something to watch and raise new SaleCompleteEvents if something goes wrong?

    Sending the same event multiple times is wrong from the perspective of an event base system, which apparently is the one you're building. Something like a SaleCompleteEvent is supposed to track an event that happened in your domain, and clearly the same sale was completed only once. Sending it twice would create a false history of your domain.