Search code examples
microservicesapi-gatewaydata-ingestion

Single data ingestion service vs multiple individual microservices?


I am trying to understand the pros and cons when having a single data ingestion microservice versus multiple individual microservices for each source of data.

The context: There are multiple sources of data that I need to get retrieve customer data from the first time they register onto my platform; however, each source, for example, Strava, Garmin, Endomondo (Sources of fitness data) they have different methods of pulling data, some of which are more complex than others.

Pros for single data ingestion:

  • Fewer microservices would be present, so possibly fewer integration issues
  • Less time spent in development
  • "fewer" teams need to be in charge since there is only one service (if we follow the one team per service rule)

Cons for single data ingestion:

  • Harder to pinpoint failure in a service
  • Availability for all the data sources could be compromised since there is technically a single point of failure
  • As more sources appear in the future, the codebase turns into a mini monolith

Current Decision From the pros and cons, having individual services for each source looks like a better option looking at the facts. If I was to go ahead, I was thinking of using the:

  1. API gateway pattern to encapsulate the individual microservices.
  2. Shared database pattern to store authentication tokens, for example
  3. Asynchronous messaging pattern to send the data retrieved from the sources to the final destination

I am looking forward to hearing if I left out any pros and cons for the argument of adoption or some counter points!


Solution

  • First of all, I have to say that in my opinion, you chose the correct option, and the greatest advantage of this option is that you are not coupling the different sources and because of that, the APIs of the different providers can change or some of them disappear or as you say before, more sources appear and the rest of your sources wouldn't be affected at all. No source code changes and no newer releases to fix that. And using the asynchronous messaging pattern guarantees that your final destination isn't either coupled to your sources.

    The only point I do not totally agree with is the Shared database pattern, the token used to authenticate in each provider is the same? In that case, it could be necessary, anyway if the only data that needs to be persisted and shared between the services is the token I would use a distributed cache like redis that is faster than a relational database