Search code examples
apache-kafkaapache-kafka-streams

Is there any storage limits for a Kafka compacted topic?


When doing stateful processing in kafka streams we can hold large state. We can provision more disks space for the client as the data grows. But what about the changelog topic? The local state is backed up in this compacted topic. Are there any limitations in how much data we can store in this topic?

We did not encounter any issues yet. But i see that some cloud services do have limitations on the size for a compacted topic. Is this a kafka limitation? An if yes, do these limitations also apply for non compacted topics?


Solution

  • Infinite retention of any topic log segments can be achieved by setting

    log.retention.bytes = -1
    log.retention.hours = -1
    

    This option is available from version 0.9.0.0 which indicates a mature feature on Kafka.

    However, many suggest that using Kafka as permanent storage is not what it was designed to do and as the amount of data stored in Kafka increases, users eventually hit a “retention cliff,” at which point it becomes significantly more expensive to store, manage, and retrieve data. The infrastructure costs will be increased as the longer the retention period the more hardware is required.

    Having said that, it seems that people do use Kafka for persistence storage, for example, The New York Times uses Kafka as a source of truth, storing 160 years of journalism going back to the 1850s.

    I would suggest using a small message size if you decide to use Kafka as a System Of Record (SOR) and to hold the state of an entity.

    Kafka makes it very clear that its performance is greatly based on the event/message size, so there is a size limit on them.

    Kafka has a default limit of 1MB per message in the topic. This is because very large messages are considered inefficient and an anti-pattern in Apache Kafka.

    more for handling larger messages here.

    By default, each Kafka topic partition log will start at a minimum size of 20MB and grow to a maximum size of 100MB on disk before a new log file is created. It's possible to have multiple log files in a partition at any one time.