Search code examples
apache-kafka-streams

Kafka Stream Transform - get data on demand and cache, lazy


What (if any) would be the best approach for Kafka Streams to build a table/stream fetching the data lazily when the key we asking for is not found there?

Let's say there's a stream A of user actions with user id field and the goal is to enrich it with user data (email, name etc.) joining on user id with table/stream B which provides the user data, but if user id key is not found in B - we fetch data and put it there. The intent is to expose this as a usual stream/table.

Thank You!

Not sure if I found anything yet.


Solution

  • Sounds like you would need to implement a custom Processor with attached state store. For each input record, you lookup the store, and if the key is found compute the join result and forward it. If the key is not found, you do the lookup, updates the state, compute the join result and forward it.

    For more details, compare the docs. https://kafka.apache.org/38/documentation/streams/developer-guide/processor-api.html