Search code examples
apache-kafkakafka-consumer-apiapache-kafka-streams

Best practice at the moment of processing data with dependencies in Kafka?


We are developing an app that takes data from different sources and once the data is available we process it, put it together and then proceed to move it to a different topic.

In our case we have 3 topics and each of these topics are going to bring data which have a relation with data from a different topic, in this case, every entity generated could be or not received at the same time (or a short period of time), and this is when the problem comes because there is a need for joining this 3 entities into one before we proceed with the moving to the topic.

Our idea was to create a separate topic which is going to contain all the data that is not processed yet and then have a separate thread that is going to check that topic in fixed intervals and also check the dependencies of this topic to be available, if they are available then we delete this entity from this separate topic, if not, we kept this entity there until it gets resolved.

At the end of all this explanation my question is if is it reasonable to do it in this way or there are other good practices or strategies that Kafka provides to solve this kind of scenarios?


Solution

  • Kafka messages could get clean after some time based on retention policy so you need to store message somewhere:

    I can see below option but always every problem have may approach and solution:

    1. Processed all message and forward "not processed message" to other topic say A
    2. Kafka Processor API to consume messages from topic A and store into the state store
    3. Schedule a punctuate() method with a time interval
    4. Iterate all messages stored in the state stored.
    5. check dependency if available delete the message from the state store and processed it or publish back to original topics to get processed again.

    You can refer below link for reference https://kafka.apache.org/10/documentation/streams/developer-guide/processor-api.html