web-services heroku database-design web-applications microservices

Sharing data between isolated microservices

I'd like to use the microservices architectural pattern for a new system, but I'm having trouble figuring out how to share and merge data between the services when the services are isolated from each other. In particular, I'm thinking of returning consolidated data to populate a web app UI over HTTP.

For context, I intend to deploy each service to its isolated environment (Heroku) where I won't be able to communicate internally between services (e.g. via //localhost:PORT. I plan to use RabbitMQ for inter-service communication, and PostgreSQL for the database.

The decoupling of services makes sense for CREATE operations:

Authenticated user with UserId submits 'Join group' webform on the frontend
A new GroupJoinRequest including the UserId is added to the RabbitMQ queue
The Groups service picks up the event and processes it, referencing the user's UserId

However, READ operations are much harder if I want to merge data across tables/schemas. Let's say I want to get details for all the users in a certain group. In a monolithic design, I'd just do a SQL JOIN across the Users and the Groups tables, but that loses the isolation benefits of microservices.

My options seem to be as follows:

Database per service, public API per service

To view all the Users in a Group, a site visitor gets a list of UserIDs associated with a group from the Groups service, then queries the Users service separately to get their names.

Pros:

very clear separation of concerns
each service is entirely responsible for its own data

Cons:

requires multiple HTTP requests
a lot of postprocessing has to be done client-side
multiple SQL queries can't be optimized

Database-per-service, services share data over HTTP, a single public API

A public API server handles request endpoints. Application logic in the API server makes requests to each service over an HTTP channel that is only accessible to other services in the system.

Pros:

good separation of concerns
each service is responsible for an API contract but can do whatever it wants with schema and data store, so long as API responses don't change

Cons:

non-performant
HTTP seems a weird transport mechanism to be used for internal communication
ends up exposing multiple services to the public internet (even if they're notionally locked down), so security threats grow from the greater attack surface

Database-per-service, services share data through a message broker

Given I've already got RabbitMQ running, I could just use it to queue requests for data and then send the data itself. So for example:

client requests all Users in a Group
the public API service sends a GetUsersInGroup event with a RequestID
the Groups service picks this up, and adds the UserIDs to the queue
The `Users service picks this up, and adds the User data onto the queue
the API service listens for events with the RequestID, waits for the responses, merges the data into the correct format, and sends back to the client

Pros:

Using existing infrastructure
good decoupling
inter-service requests remain internal (no public APIs)

Cons:

Multiple SQL queries
Lots of data processing at the application layer
harder to reason about
Seems strange to pass large quantities of data via an event system
Latency?

Services share a database, separated by schema, other services read from `VIEW`s

Services are isolated into database schemas. Schemas can only be written to by their respective services. Services expose a SQL VIEW layer on their schemas that can be queried by other services.

The VIEW functions as an API contract; even if the underlying schema or service application logic changes, the VIEW exposes the same data, so that

Pros:

Presumably much more performant (a single SQL query can get all relevant data)
Foreign key management is much easier
Less infrastructure to maintain
Easier to run reports that span multiple services

Cons:

tighter coupling between services
breaks the idea of fundamentally atomic services that don't know about each other
adds a monolithic component (database) that may be hard to scale (in contrast to atomic services which can scale databases independently as required)
Locks all services into using the same system of record (PostgreSQL might not be the best database for all services)

I'm leaning toward the last option, but would appreciate any thoughts on other approaches.

Solution

To evaluate the pros and cons I think you should focus on what microservices architecture is aiming to achieve. In my opinion Microservices is architectural style aiming to build loosely couple applications. It is not designed to build high performance application so scarification of performance and data redundancy are something we are ready accept when we decided to build applications in a microservices way.

I don't think you services should share database. Tighter coupling scarify the main objective of the microservices architecture. My suggestion is to create a consolidated data service which pick up the data changes events from all the other services and update the database behind it. You might want to design the database behind the consolidated data service in a way that is optimised for query (like a data warehouse) because that's all this service will be used for. You might want to consider using a NoSQL database to support your consolidated data service.

Sharing data between isolated microservices

Database per service, public API per service

Database-per-service, services share data over HTTP, a single public API

Database-per-service, services share data through a message broker

Services share a database, separated by schema, other services read from VIEWs

Services share a database, separated by schema, other services read from `VIEW`s