Search code examples
apache-kafkakafka-consumer-apikafka-producer-apikafka-topic

Kafka not getting rid of data when setting retention.ms


So when I look for a way to count the messages in a topic, this one is good

kafka-run-class kafka.tools.GetOffsetShell --broker-list broker1:9092,broker2:9092,broker3:9092 --topic rev-dly-upd --time -1

The only thing is, when I change the retention.ms config to retention.ms=1000, and even check that the topic has been configured by running kafka-topics --describe --zookeeper zookeeper1:2181 --topic rev-dly-upd . I can see clearly that that config is set at 1000...

Topic:rev-dly-upd   PartitionCount:8    ReplicationFactor:3 Configs:retention.ms=1000
    Topic: rev-dly-upd  Partition: 0    Leader: 159 Replicas: 159,96,160    Isr: 159,96,160
    Topic: rev-dly-upd  Partition: 1    Leader: 160 Replicas: 160,159,94    Isr: 94,160,159
    Topic: rev-dly-upd  Partition: 2    Leader: 94  Replicas: 94,160,95 Isr: 95,94,160
    Topic: rev-dly-upd  Partition: 3    Leader: 95  Replicas: 95,94,96  Isr: 95,96,94
    Topic: rev-dly-upd  Partition: 4    Leader: 96  Replicas: 96,95,159 Isr: 95,96,159
    Topic: rev-dly-upd  Partition: 5    Leader: 159 Replicas: 159,160,94    Isr: 159,94,160
    Topic: rev-dly-upd  Partition: 6    Leader: 160 Replicas: 160,94,95 Isr: 94,160,95
    Topic: rev-dly-upd  Partition: 7    Leader: 94  Replicas: 94,95,96  Isr: 95,96,94

yet when I run kafka-run-class kafka.tools.GetOffsetShell --broker-list broker1:9092,broker2:9092,broker3:9092 --topic rev-dly-upd --time -1 all I always get records returned. What could the reasons be?


Solution

  • Basically I had to stop using kafka-run-class kafka.tools.GetOffsetShell to count the messages in a topic. If you google "how to count messages in kafka topic", a lot of posts and things will lead you to think that the above command, given the right arguments, will give you a count of total messages. However if you have purged messages during the lifespan of the topic, then it will not give you an accurate count. You just have to do something like open a console consumer, output to text file, and then read the lines of that file with old-fashioned wc -l.