Search code examples
restspring-bootmicroservices

Microservices - How to ensure referential integrity?


I'm creating a personal expenses manager app. In order to do so, I'm creating some microservices and I'm adopting the "database per service" pattern. So, I have:

  • Expense database
    • Columns are: id, category_id, name, amount, payment_date, details

  • Category database
    • Columns are: id, name

The problem I'm facing right now is: one expense can (and should) have one category. If the services have their own databases, how can I ensure that a given expense has an existing category? The only way I can imagine right now is:

At expense's creation time, I make a request to categories service in order to validate category's existence. But I can clearly see a big flaw with this approach: It may work well with a single relationship, but what when I have four more? Performance wise, it would be a mess calling five other services to ensure integrity.

I have no idea on how to deal with this problem. Any advice on how to solve this the better way?


Solution

  • one expense can (and should) have one category. If the services have their own databases, how can I ensure that a given expense has an existing category?

    Broadly, you don't. Which is to say, information that needs to be consistent must be stored in the same place (ie, part of the same "microservice"). You only distribute data across multiple databases when it doesn't have to be consistent.

    One sort of compromise that is sometimes acceptable is that we can store in the expense database a cached copy of the category information. That allows you to think about adding constraints that the expense data must be consistent with the cached copy of the category data, provided that you can deal with the fact that the copy of the category data will be stale, and may be invalidated by changes made to the category data.

    But enforcing referential integrity has a problem with race conditions; I submit an expense in a category that "really" exists, but hasn't appeared in the cached copy yet. What should happen? "A microsecond difference in timing shouldn’t make a difference to core business behaviors."

    Another sort of compromise is to model time -- an expense on Tuesday can use a category that was valid on Tuesday, even though it was no longer valid on Wednesday. So the expense service can suspend judgement until it knows whether or not the category is valid at the appropriate time. This makes sense when changes to expense policies are planned in advance.

    Another sort of compromise would be to re-organize the implementation of your business capabilities, so that the behaviors associated with category are all performed by the service that manages that data. The expenses service would know about the identifier, but very little else.

    There is no magic - distributed systems require compromises.