I am using Kafka to decouple my services, but I'm having some seconds thoughts on the way services consume and produce inputs and outputs.
If I have a service A, which takes data from some external service out of my control, I am forced to adapt to the data format (domain) that external system provides. Following such practice, my service A pushes its results to a topic in its own format (domain).
Incidentally, I have a service B, which does similar thing to service A, but consumes some other external service, and has its own data format (domain), which it pushes to a separate topic.
Now, the semantics of the data produced by A and B are similar, but not the same. But, next step in the pipeline is a service C, which should consume both what A and B produce, do something with it and spit out the results.
Should C only know how to consume data from one place, which would imply that A & B (and any other ones in the future) need to produce their outputs in C-specific domain? That would mean, if the C consumer ever changes it's domain, A, B, and any other producers will have to change, which I don't like. Also, if I add another consumer, D, for example, that means that A and B, using this analogy, should know that D is also their consumer, which looks horrible to me.
I was thinking that C should be responsible for it's inputs, meaning it has a dependency on A and B models (and any other that might produce their own data). It also implies that, when a new source is added, C must be changed to include that data as well.
Effectively, I'm leaning towards ManySources-OneSink component, instead of OneSource-ManySinks.
Are there any preferred practices?
Are there any preferred practices?
Message specifications.
That is, instead of coupling A, B, and C to each other, you could them to message formats mA, mB, and mC. So long as (mA, mB, mC) are compatible, the services will be able to communicate with each other.
One way to achieve this is to restrict mA, mB, and mC to being different "versions" of the same schema, where the evolution of the schema is constrainted
Greg Young's book Versioning in an Event Sourced System dedicates a chapter to this idea. You'll see similar ideas is you look at the various standardized message serializations (Avro, Protocol Buffers, etc).
Should one service produce output on its own topic and make dependent services to consume from them, or vice versa, or some third option?
Mostly, that's plumbing -- how do we get a copy of a message from one system to another?
Conceptually, it seems that you want C to be consuming a logical outer join of the messages produced by A and B. So I suppose the immediate question would be whether you currently have consumers of these same messages from A that don't want the messages from B, or vice versa. If this is the only use case you know about, there might be benefits to reducing things to a single topic.