c#microservices integration publish-subscribe event-sourcing

How to recover from missed integration or notification events in event driven architecture?

The situation is as follows. There are three services, one service is event sourced and publishes integration or notification events (outbox pattern) to the other two services (subscribers) using an event bus (like Azure Service bus or ActiveMQ).

This design is inspired by .NET microservices - Architecture e-book - Subscribing to events.

I'm wondering what should happen if one of these events can not be delivered due to an error or if event handeling simply wasn't implemented correctly.

Should I trust my message bus in case of an application error?
- Is this a usecase for dead letter queues?
On republishing events, should all messages be republished to all topics or would it be possible to only republish a subset?
- Should the service republishing events be able to access publisher and subscriber databases to know the message offset?
- Or should the subscribing microservices be able to read the outbox?

Solution

Should I trust my message bus in case of an application error?

Yes.

(Edit: After reading this answer, read @StuartLC's answer for more info)

The system you described is an eventually consistent one. It works under the assumption that if each component does its job, all components will eventually converge on a consistent state.

The Outbox's job is to ensure that any event persisted by the Event Source Microservice is durably and reliably delivered to the message bus (via the Event Publisher). Once that happens, the Event Source and the Event Publisher are done--they can assume that the event will eventually be delivered to all subscribers. It is then the message bus's job to ensure that that happens.

The message bus and its subscriptions can be configured for either "at least once" or "at most once" delivery. (Note that "exactly once" delivery is generally not guaranteeable, so an application should be resilient against either duplicate or missed messages, depending on the subscription type).

An "at least once" (called "Peek Lock" by Azure Service Bus) subscription will hold on to the message until the subscriber gives confirmation that it was handled. If the subscriber gives confirmation, the message bus's job is done. If the subscriber responds with an error code or doesn't respond in a timely manner, the message bus may retry delivery. If delivery fails multiple times, the message may be sent to a poison message or dead-letter queue. Either way, the message bus holds on to the message until it gets confirmation that it was received.

On republishing events, should all messages be republished to all topics or would it be possible to only republish a subset?

I can't speak for all messaging systems, but I would expect a message bus to only republish to the subset of subscriptions that failed. Regardless, all subscribers should be prepared to handle duplicate and out-of-order messages.

Should the service republishing events be able to access publisher and subscriber databases to know the message offset?

I'm not sure I understand what you mean by "know the message offset", but as a general guideline, microservices should not share databases. A shared database schema is a contract. Once the contract established, it is difficult to change unless you have total control over all of its consumers (both their code and deployments). It's generally better to share data through application APIs to allow more flexibility.

Or should the subscribing microservices be able to read the outbox?

The point of the message bus is to decouple the message subscribers from the message publisher. Making the subscribers explicitly aware of the publisher defeats that purpose, and will likely be difficult to maintain as the number of publishers and subscribers grows. Instead, rely on a dedicated monitoring service and/or the monitoring capabilities of the message bus to track delivery failures.