Search code examples
architecturemicroservicesbackendsystem-design

Taxonomies In Reference Data


I have a set of microservices that all have their own databases but which sometimes share reference data. This reference data is usually a taxonomy of some sort e.g.

A list of skills where there are multiple parent child relationships (graph like) between skills e.g.

  • Operations --> HR --> Benefits
  • Front End --> React --> Hooks
  • Front End --> Vue --> VueX
  • Front End --> SPA --> React
  • Front End--> State Management --> Hooks
  • Backend --> Databases --> Postgres

Different microservices will frequently need to search their internal database based on this taxonomy e.g. if the input/search skill is VueX, return "adjacent" entities (e.g. related to Vue, State Management and Front End) as well.

Questions:

  • Where should this reference data be stored? In a specific microservice just for reference data? Shared database?
  • How should this reference data be stored? Is this a good candidate for a graph db? Or is a relational db fine here.

Solution

  • You could keep the primary copies of the reference data in their own micro-service for example a service named ReferenceData. You could then keep copies of this data in each system that needs to make use of it. This would make the two services eventually consistent https://www.keboola.com/blog/eventual-consistency

    Lets say we have a micro service named X that has records of Y held in a relational database. If the reference data changes infrequently and you don't need it to update on all existing records, you could just have Y hold the string value in its table rather than the ID (this is the simplest approach).

    If you do need it to be in sync so that changes to the reference data update all things that use it, you would need a mechanism to carry that out. Either a passive mechanism where data is routinely copied from one system to another or an active mechanism where changes are pushed out when they happen.

    For the first option you could keep a copy of the reference data in the X database then either sync it regularly (daily, half hours, minutely), by adding an operation to X such as uploadReferenceData and using a scheduler like cron to handle it.

    For the second options, you could use an event driven mechanism so that when the reference data changes X receives a notification and updates its copy.