Search code examples
apache-kafkaapache-kafka-streamskafka-producer-api

Search for point in time in Kafka (key: validFrom, validTill)


I got some usecase, where contractors work for a specified time in my housing project. And I want to map it to kafka and thought of a topic like:

key : {"validFrom":"2019-09-01", "validTill":"2019-10-10", "name":"contractor1"}

Messsage is a more complicated, like costs that variate at which weekday "contractor1" is working for me.

Another service of mine will query the topic for "2019-10-02" and the message linked to the key, which is between validFrom - validTill, will be returned.

Is this a meaningful way to use kafka or am I thinking in the wrong direction ?(The key will be unique)


Solution

  • If by "point in time" you mean the time of message creation, then you can search by message timestamp - that search is very efficient because timestamp is indexed on the server side.

    If you want to find a message based on the value of some message field, like "validFrom" - that will take some time for large topics - you'll have to scan every message in the topic. So, it would make sense to use combination of both methods.

    Some UI tools allow you to do this type of search out-of-the-box, take a look at Kafka Magic https://www.kafkamagic.com - it allows writing complex queries using standard JavaScript in combination with timestamp/partition/offset filters.

    If you are writing your own solution - standard Kafka client SDK for many languages has methods for locating messages by the timestamp - point your consumer to the start timestamp and read message after message until you find what you are looking for. That is a perfectly valid method.