Search code examples
apache-kafkamicroservicesloose-coupling

Microservices & Kafka: To couple or not to couple


I'm having a problem wrapping my mind around a probably normal setup of Microservices and Kafka we are currently setting up.

We are having one Topic in Kafka and multiple consumers reading from this Topic via separate consumer groups. But somehow I think this could lead to coupling in terms of Microservices as we are having two consumers reading the exact data from the same Topic. Additionally we do not have any retention time for the messages and therefore I'm treating The Kafka as some Kind of data store. So I would think we should rather replicate the messages into its own topic for another Service/consumer.

We are having different opinions on how this is coupling or decoupling and I'd like to hear you opinions on what I'm getting wrong because I feel like I do. Thank you for your support!


Solution

  • In my opinion using a Kafka topic for multiple services or apps to consume is the right approach as long as your services don't rely on it repeatedly. Meaning a service should read the queue once, translate the data into whatever it requires and store it by itself if required. This way the topic doesn't become a permanent data store but a rather a decoupled way to input data (as if you were to call the service directly with that raw data, but in a more decoupled fashion by allowing the service to read the topic whenever ready for it in whatever frequency that is required). This increases the resilience of your overall system.

    And there is a coupling, that is the raw data. But from my perspective it is totally OK for multiple services to understand the same data format (of the topic) - As long as its format is mostly stable. The assumption here is that this is raw data that each service has to transform into a form that is useful for itself. You just have to make sure the raw data format is versioned correctly whenever changes are necessary. And to allow services to continue to work you will have to potentially deliver multiple versions concurrently until all services support the latest version. This type of architectural style is used by many large systems and works, as long as you don't have a scenario where you need to require the raw data format to change very frequently in a way that makes it incompatible with your service designs. (If that were the case you'd probably need another layer of stable meta-model below that can describe the dynamic raw-data.)