We are having an interesting file handles and cpu usage behavior on our Kafka Cluster that I cannot explain :) I'm not sure which information would be needed to figure out the reason, so I will list some (tell me if you are missing any):
In addition, we have 4 compacted topics (1 partition) with a very small segment_ms and retention_ms set to 1 minute. This topics are used as cache to serve the latest values.
Here is a metric showing the sawtooth behavior:
The file handle spices are about 7 days long and also seem to relate to the cpu usage. The default segment_ms (which we use for the majority of our topics) is 7 days long. Not sure if this relates.
Any ideas why this happens? Thanks!
Apparently, this interesting behavior is caused by our "compacted" topics. We replaced almost all "compacted" topics with "delete" and only kept those 4 that are really mandatory (as caching). Now the behavior is back to normal (as you can see for the last couple of days).
In Kafka, a topic consists of segments. A segment is only "garbage collected" once the last entry is gone. If a topic is compacted, there might be single entries (without further updates) blocking the whole segment from being "garbage collected" which leads to many "open file handles". With "delete" segments are garbage collected more constantly.