Search code examples
apache-kafkamicroservices

Microservices with Kafka - how can we know when a service has successfully processed a message


We currently have a topic that is being consumed by two services as outlined in the architecture below. One is NLP service and the other one is CV service. They are separated because they belong to different teams.

Let's say the original message is like this:

{
    "id": 1234,
    "text": "I love pizza",
    "photo": "https://photo.service/photo001"
}

The NLP service will process the message and produce a new message to topic 1 as below:

{
    "id": 1234,
    "text": "I love pizza",
    "nlp": "pizza",
    "photo": "https://photo.service/photo001"
}

And the CV (Computer Vision) will process it and produce the below message to topic 2:

{
    "id": 1234,
    "text": "I love pizza",
    "photo": "https://photo.service/photo001",
    "cv": ["pizza", "restaurant", "cup", "spoon", "folk"]
}

Lastly, there's a final service that need both pieces of information from the two services above. However, the amount of time taken by NLP service and CV service is different. Now, as the final service, how do I grab both messages from topic 1 and topic 2 for this particular message with id 1234?

architecture


Solution

  • You can use Kafka Streams or ksqlDB to run a join query. Otherwise, you'd use an external database for the same.

    E.g. You'd create a table for whichever events finish "first", then you join the second incoming stream on the ID keys of that table. Without a persistent table, you can join two streams, but this assumes there is a time window in which both events will exist.

    Alternatively, don't split the incoming stream.

    A -> NLP -> CV -> final service