Search code examples
apache-kafka-streams

What does a full Kafka stream cache cause as internal operation?


When setting the cache we set the size and the commit interval ? I understand that when the commit interval is passed, a commit is called, but what operation is triggered when the cache is full. Does it also trigger a commit, causing the kafka stream application to recode it in its metrics as a commit operation ? Or it simply cause a forward operation evicting the oldest records ?

My goal is to be able to monitor my kafka stream application and understand the metrics i am seeing ?


Solution

  • The kafka stream cache (records cache) is using for internal caching and compacting output records of a KTable which you created using StreamsBuilder.table() or StreamsBuilder#globalTable(), and also KTable which is the result of aggregate. It buffers output records of KTable before they are written to underlying state store (RocksDb) and downstream processors.

    Processor API does using this cache to buffer output records before writing to state store but not for downstream processors.

    1. but what operation is triggered when the cache is full?

      When the records cache is full (setting cache.max.bytes.buffering) the buffer flush some output records (default is LRU cache so some eldest output records) to the underlying state store and downstream processors. You can view a visualized example here.

    2. Does it also trigger a commit? Or it simply cause a forward operation evicting the oldest records ?

      I looked into internal code and it only flush the eldest records cache which will writes output records to state store and forward to downstream processors. It does not trigger a commit which in turn flush Producer so your records in state store is not produced to the internal kafka changelog topic util stream thread is committed.