Let us say we have the follwoing
I want to enrich this topic X with data from outside Kafka, (i.e. from a CSV file that contains all the entities Y).
The solution I have now is as follows:
I am still evaluating if Kafka streams or Ksql can do the same for me,
My question is there an efficient way to do this with Kafka streams library or KSQL without losing performance?
Sure, you can do something like this
final Map m = new Hashmap();
builder.stream(topic).mapValues(v -> m.get(v)).to(out);
But Kafka Streams is ideally going to be distributed, and your CSV would therefore need to be synced across multiple machines.
Rather than building a map, use a KeyValueStore (this can also be in memory, but using RocksDB is more fault tolerant) via a KTable and use Kafka Connect Spooldir connector to load the CSV to a topic , build a table from that, then join only topics