rest architecture microservices domain-driven-design webclient

Data duplication or async on-demand oriented communication on microservices

In our current microservice, we store the data that doesn't belong to us and we persist them all through external events. And we use these duplicate data (that doesn't belong to us) in our actual calculation but I've been thinking what if we replace this duplicate data with async webclient on-demand calls with resilience fallbacks? Everywhere we need the data, we'll call the owner team through APIs. With this way, we'll set us free from maintaining the duplicate data because many times inconsistency happens when the owner team stop publishing the data because of an internal error. In terms of CAP, consistency is more important for us. We can give the responsibility of availability to the data owner team. For why not monolith counter argument, in many companies, there are teams for each service and it's not up to you to design monolith. My question, in this relation, is more about the general company-wide problem. When your service, inevitably, depends on another team's service, is it better to duplicate a data or async on-demand dependency?

Solution

When your service, inevitably, depends on another team's service, is it better to duplicate a data or async on-demand dependency?

As usually with architecture questions - answer would be "it depends". The "correct" answer for your case would depend on what trade-off is more acceptable for you.

Data duplication (with some async process of refreshing it via some kind of queue) is a pretty standard way to handle things in microservices architecture in many cases. But in can become stale, as you can observe.

On the other hand directly calling another service API has trade-offs too. First of all due to switching to synchronous communication your services become more tightly coupled with all the possible downsides of that (like unavailability of the service will make your API relying on it fail too, your SLAs will depend on the underlying service SLA's).

We can give the responsibility of availability to the data owner team.

If the other team will guarantee to satisfy the needed SLA/SLO (which is a bit doubtful since you have mentioned that they can't guarantee publishing the data updates) then switching to the synchronous communication seems to be the way to go.

in our actual calculation

For async processes (like background calculations and so on) IMO the default approach is to directly call the service holding the needed data (if the service can handle the load, respond quick enough and so on). So if your calculation is a part of some async process then switching to calling the owner system seems to be the way to go.