Search code examples
apache-kafkakafka-consumer-apikafka-python

reading only specific messages from kafka topic


Scenario:

I am writing data JSON object data into kafka topic while reading I want to read an only specific set of messages based on the value present in the message. I am using kafka-python library.

sample messages:

{flow_status: "completed", value: 1, active: yes}
{flow_status:"failure",value 2, active:yes}

Here I want to read only messages having flow_Status as completed.


Solution

  • In Kafka it's not possible doing something like that. The consumer consumes messages one by one, one after the other starting from the latest committed offset (or from the beginning, or seeking at a specific offset). Depends on your use case, maybe you could have a different flow in your scenario: the message taking the process to do goes into a topic but then the application which processes the action, then writes the result (completed or failed) in two different topics: in this way you have all completed separated from failed. Another way is to use a Kafka Streams application for doing the filtering but taking into account that it's just a sugar, in reality the streams application will always read all the messages but allowing you to filter messages easily.