domain-driven-design microservices eventual-consistency

Approaches to update microservices databases retroactively

Let's suppose there are two microservices: A and B. Each has its own databases.

A has a database with this scheme:

{
    "id": "unique id for the user",
    "name": "the name of the user",
    "email": "the email of the user",
    "address": "the address of the user"
}

and B has this scheme:

{
    "userId": "unique id for the user",
    "email": "the email of the user"
}

whenever I insert something at A's database, an event is dispatched and B eventually gets it and save some of the data received from A's event (currently, the user id and its email).

Everything great and B's database has the user id and email of every user registered so far.

Now, for whatever reason, I need B to also have each user's address, so B's database schema will look like this:

{
    "userId": "unique id for the user",
    "email": "the email of the user",
    "address": "the address of the user"
}

Now, every user within B's database can have an address field, but they will be null for now.

My question is: What are the approaches to make B's database consistent with A's (i.e how do I update every user of B to also have the address populated)?

I know I can update my event and now include the address, but it will only work for new users, the old ones will have null addresses. Should I scan the whole database and manually dispatch an event for each user?

Solution

Depends on your system and if the data is persistent or not.

If you have the events and the event contain all the data, replay them.

If you don't have the events (i.e. no Event Sourcing), write an migration service, that you run once before (or during) deployment of your application, that will for every Entry in B, fetch the data from A and update it.

Alternatively just wait, until it updates at some point. But for this cases you will always need some kind of "initial seeding" (i.e. wenn A exited long before B was developed). Its no different in this case, since B is not considered "single source of truth" (that's A - B is just projection of A), its easy to discard all of B's data and reseed it from A's data.

Also remember, that you can also update the data on every start of the application. In practice that's not an issue (assuming the data isn't too big), because if you do the update during the start and before your application starts processing the event, it will always end up with the most current state and if there were some events during the migration, they will be processed again and in worst case just do some unnecessary updates - assuming your events are idempotent which is important in such a system to design the events as idempotent.