Search code examples
apache-kafkaapache-kafka-streamsktable

Apache Kafka - KStream and KTable hard disk space requirements


I am trying to, better, understand what happens in the level of resources when you create a KStream and a KTable. Below, I wil mention some conclusions that I have come to, as I understand them (feel free to correct me).

Firstly, every topic has a number of partitions and all the messages in those partitions are stored in the hard disk(s) in continuous order.

A KStream does not need to store the messages, that are read from a topic, again to another location, because the offset is sufficient to retrieve those messages from the topic which is connected to. (Is this correct? )

The question regards the KTable. As I have understand, a KTable, in contrast with a KStream, updates every message with the with the same key. In order to do that, you have to either store externally the messages that arrive from the topic to a static table, or read all the message queue, each time a new message arrives. The later does not seem very efficient regarding time performance. Is the first approach I presented correct?


Solution

  • read all the message queue, each time a new message arrives.

    All messages are only read at the fresh start of the application. Once the app reads up to the latest offset, it's just updating the table like any other consumer

    How disk usage is determined ultimately depends on the state store you've configured for the application, along with its own settings. For example, in-memory vs rocksdb vs an external state store interface that you've written on your own