Search code examples
kotlinapache-kafkakafka-consumer-api

KafkaConsumer: `seekToEnd()` does not make consumer consume from latest offset


I have the following code

class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {

    fun run() {
        consumer.seekToEnd(emptyList())
        val pollDuration = 30 // seconds

        while (true) {
            val records = consumer.poll(Duration.ofSeconds(pollDuration))
            // perform record analysis and commitSync()
            }
        }
    }
}

The topic which the consumer is subscribed to continously receives records. Occasionally, the consumer will crash due to the processing step. When the consumer then is restarted, I want it to consume from the latest offset on the topic (i.e. ignore records that were published to the topic while the consumer was down). I thought the seekToEnd() method would ensure that. However, it seems like the method has no effect at all. The consumer starts to consume from the offset from which it crashed.

What is the correct way to use seekToEnd()?

Edit: The consumer is created with the following configs

fun <T> buildConsumer(valueDeserializer: String): KafkaConsumer<String, T> {
    val props = setupConfig(valueDeserializer)
    Common.setupConsumerSecurityProtocol(props)
    return createConsumer(props)
}

fun setupConfig(valueDeserializer: String): Properties {
    // Configuration setup
    val props = Properties()

    props[ConsumerConfig.GROUP_ID_CONFIG] = config.applicationId
    props[ConsumerConfig.CLIENT_ID_CONFIG] = config.kafka.clientId
    props[ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG] = config.kafka.bootstrapServers
    props[AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG] = config.kafka.schemaRegistryUrl

    props[ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG] = config.kafka.stringDeserializer
    props[ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG] = valueDeserializer
    props[KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG] = "true"

    props[ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG] = config.kafka.maxPollIntervalMs
    props[ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG] = config.kafka.sessionTimeoutMs

    props[ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG] = "false"
    props[ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG] = "false"
    props[ConsumerConfig.AUTO_OFFSET_RESET_CONFIG] = "latest"

    return props
}

fun <T> createConsumer(props: Properties): KafkaConsumer<String, T> {
    val consumer = KafkaConsumer<String, T>(props)
    consumer.subscribe(listOf(config.kafka.inputTopic))
    return consumer
}

Solution

  • I found a solution!

    I needed to add a dummy poll as a part of the consumer initialization process. Since several Kafka methods are evaluated lazily, it is necessary with a dummy poll to assign partitions to the consumer. Without the dummy poll, the consumer tries to seek to the end of partitions that are null. As a result, seekToEnd() has no effect.

    It is important that the dummy poll duration is long enough for the partitions to get assigned. For instance with consumer.poll((Duration.ofSeconds(1)), the partitions did not get time to be assigned before the program moved on to the next method call (i.e. seekToEnd()).

    Working code could look something like this

    class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {
    
        fun run() {
            // Initialization 
            val pollDuration = 30 // seconds
            consumer.poll((Duration.ofSeconds(pollDuration)) // Dummy poll to get assigned partitions
    
            // Seek to end and commit new offset
            consumer.seekToEnd(emptyList())
            consumer.commitSync() 
    
            while (true) {
                val records = consumer.poll(Duration.ofSeconds(pollDuration))
                // perform record analysis and commitSync()
                }
            }
        }
    }