Search code examples
web-servicesherokudatabase-designweb-applicationsmicroservices

Sharing data between isolated microservices


I'd like to use the microservices architectural pattern for a new system, but I'm having trouble figuring out how to share and merge data between the services when the services are isolated from each other. In particular, I'm thinking of returning consolidated data to populate a web app UI over HTTP.

For context, I intend to deploy each service to its isolated environment (Heroku) where I won't be able to communicate internally between services (e.g. via //localhost:PORT. I plan to use RabbitMQ for inter-service communication, and PostgreSQL for the database.

The decoupling of services makes sense for CREATE operations:

  • Authenticated user with UserId submits 'Join group' webform on the frontend
  • A new GroupJoinRequest including the UserId is added to the RabbitMQ queue
  • The Groups service picks up the event and processes it, referencing the user's UserId

However, READ operations are much harder if I want to merge data across tables/schemas. Let's say I want to get details for all the users in a certain group. In a monolithic design, I'd just do a SQL JOIN across the Users and the Groups tables, but that loses the isolation benefits of microservices.

My options seem to be as follows:

Database per service, public API per service

To view all the Users in a Group, a site visitor gets a list of UserIDs associated with a group from the Groups service, then queries the Users service separately to get their names.

Pros:

  • very clear separation of concerns
  • each service is entirely responsible for its own data

Cons:

  • requires multiple HTTP requests
  • a lot of postprocessing has to be done client-side
  • multiple SQL queries can't be optimized

Database-per-service, services share data over HTTP, a single public API

A public API server handles request endpoints. Application logic in the API server makes requests to each service over an HTTP channel that is only accessible to other services in the system.

Pros:

  • good separation of concerns
  • each service is responsible for an API contract but can do whatever it wants with schema and data store, so long as API responses don't change

Cons:

  • non-performant
  • HTTP seems a weird transport mechanism to be used for internal communication
  • ends up exposing multiple services to the public internet (even if they're notionally locked down), so security threats grow from the greater attack surface

Database-per-service, services share data through a message broker

Given I've already got RabbitMQ running, I could just use it to queue requests for data and then send the data itself. So for example:

  • client requests all Users in a Group
  • the public API service sends a GetUsersInGroup event with a RequestID
  • the Groups service picks this up, and adds the UserIDs to the queue
  • The `Users service picks this up, and adds the User data onto the queue
  • the API service listens for events with the RequestID, waits for the responses, merges the data into the correct format, and sends back to the client

Pros:

  • Using existing infrastructure
  • good decoupling
  • inter-service requests remain internal (no public APIs)

Cons:

  • Multiple SQL queries
  • Lots of data processing at the application layer
  • harder to reason about
  • Seems strange to pass large quantities of data via an event system
  • Latency?

Services share a database, separated by schema, other services read from VIEWs

Services are isolated into database schemas. Schemas can only be written to by their respective services. Services expose a SQL VIEW layer on their schemas that can be queried by other services.

The VIEW functions as an API contract; even if the underlying schema or service application logic changes, the VIEW exposes the same data, so that

Pros:

  • Presumably much more performant (a single SQL query can get all relevant data)
  • Foreign key management is much easier
  • Less infrastructure to maintain
  • Easier to run reports that span multiple services

Cons:

  • tighter coupling between services
  • breaks the idea of fundamentally atomic services that don't know about each other
  • adds a monolithic component (database) that may be hard to scale (in contrast to atomic services which can scale databases independently as required)
  • Locks all services into using the same system of record (PostgreSQL might not be the best database for all services)

I'm leaning toward the last option, but would appreciate any thoughts on other approaches.


Solution

  • To evaluate the pros and cons I think you should focus on what microservices architecture is aiming to achieve. In my opinion Microservices is architectural style aiming to build loosely couple applications. It is not designed to build high performance application so scarification of performance and data redundancy are something we are ready accept when we decided to build applications in a microservices way.

    I don't think you services should share database. Tighter coupling scarify the main objective of the microservices architecture. My suggestion is to create a consolidated data service which pick up the data changes events from all the other services and update the database behind it. You might want to design the database behind the consolidated data service in a way that is optimised for query (like a data warehouse) because that's all this service will be used for. You might want to consider using a NoSQL database to support your consolidated data service.