Search code examples
apache-kafkakafka-consumer-apiapache-kafka-streams

Retrieve info from Kafka that has a field matching one value of a very long list


I am kind of new to Kafka.

I have a conceptual question. Let's assume that there is a Kafka topic (publish subscribe) which has messages (formatted in JSON). Each message has a field called "username". There are multiple applications consuming this topic. Assume that we have one application that handles messages for 100,000 users. This application has the list of 100,000 user names. So our application needs to watch the topic and process the messages that have the username field that matches to any one of our 100,000 user names.

One way of doing this is we read each message published and get the username in that message and iterate through the list of 100,000 usernames we have. If one name in our list matches the username, we process that, else we ignore that message.

Is there any other, more elegant way to do this like, is there any feature in Kafka streams or consumer api to do this?

Thanks


Solution

  • You must consume, deserialize, and inspect every record. You can't get around consumer api basics using any higher level library, but yes, ksqlDB or Kafka Streams make such code easier to write, just not any more performant

    If you want to check a field is in a list, use a Hashset