I'm working on a microservice architecture with separated databases but need to replicate some data for resiliency.
As an exemple, let's say I'm working on a blog and have two domains: users
and articles
, each with its own database. In case the users
microservice goes down I still need to be able to show an article author name.
-- in the 'users' domain's database
create table users (
id uuid primary key,
name varchar(32)
);
-- in to the 'articles' domain's database
create table articles (
id uuid primary key,
author uuid,
author_name varchar(32),
contents text
);
So when I'm creating an article, I send the user identifier.
My question is, at what point and how am I supposed to get the username?
FWIW, my reference for these is this F.A.Q.
Thanks a lot for reading this; I hope you'll have a solution for me! Have a nice day.
My question is, at what point and how am I supposed to get the username?
1) You fetch the username from the local cache of reference data 2) Your reporting logic needs to support the case that the cache doesn't yet have a copy of the reference data 3) Your reporting logic needs to support the case that the cached copy of the reference data is stale.
Reference data here being shorthand for any information that the service needs, for which it isn't itself the authority.
So in a typical solution, the User service would have the authoritative copy/copies of the username, and all of the logic for determining whether or not a change to that value is allowed. The Articles service would have a local copy of that data, with metadata describing how long that information may be used.
The user database would have a copy of all of the information that it is responsible for. The article database would only have the slice of user information that the article service cares about.
A common way to implement this is to arrange a subscription, pulling the data from the users database to the articles database when the non-authoritative copy is no longer fresh.
You can treat the cache as a fallback position -- if we can't get timely access to the latest username, then use the cached copy.
But there's no magic - it will sometimes happen that the remote data is not available AND the local cache doesn't have a valid copy.
It may help to keep in mind that a lot of your data is already reference data -- copied into your local databases by the real world.
If I may ask, instead of having metadata then pulling the data periodically to update the cache, shouldn't I just replicate it once then listen for the 'username changed' event?
What happens if that event doesn't get delivered?
In distributed systems, it's really important to ask what happens if some process fails or some message is lost right at a critical point. How do you recover.
When I follow through that line of thinking, what I end up with is that client polling is the primary mechanism for retrieving reference data, and push notifications are latency optimizations that indicate we should poll now, rather than waiting for the entire scheduled interval.