Search code examples
apache-kafkaapache-kafka-streams

KTable shows the whole event log instead of the most recent update (Kafka Streams)


I have a topic with 4 events inside. Raw topic events: topic data

The following topology should only print the most recent update to each stock, but it shows the whole log stream. This is the code:

val builder = new streams.StreamsBuilder
val tableData = builder.table[String, StockData](inputTopic)
tableData.toStream().print(Printed.toSysOut[String, StockData].withLabel("table-form"))
builder.build()

But the result log looks like this

[table-form]: CCC, {"stockVal":10,"times":1234}
[table-form]: BBB, {"stockVal":10,"times":1234}
[table-form]: AAA, {"stockVal":10,"times":1234}
[table-form]: AAA, {"stockVal":20,"times":1240}

Why do I get the AAA stock printed twice? All the messages are in the topic when the app is running and so to the best of my understanding I should only get the last value of AAA

Thanks!


Solution

  • For any table, table.toStream() will output its current compacted data, plus any new events into it. It will not dedupe in the logs.

    If you always want the current state of the table, then you'll need to iterate its statestore instead.

    Keep in mind that the raw topic for the table will also temporarily hold duplicate keys until compaction occurs.