Search code examples
apache-kafkakafka-python

How to force a consumer to read a specific partition in kafka


I have an application for downloading specific web-content, from a stream of URL's generated from 1 Kafka-producer. I've created a topic with 5 partitions and there are 5 kafka-consumers. However the timeout for the webpage download is 60 seconds. While one of the url is getting downloaded, the server assumes that the message is lost and resends the data to different consumers.

I've tried everything mentioned in

Kafka consumer configuration / performance issues

and

https://github.com/spring-projects/spring-kafka/issues/202

But I keep getting different errors everytime.

Is it possible to tie a specific consumer with a partition in kafka? I am using kafka-python for my application


Solution

  • I missed on the documentation of Kafka-python. We can use TopicPartition class to assign a specific consumer with one partition.

    http://kafka-python.readthedocs.io/en/master/

    >>> # manually assign the partition list for the consumer
    >>> from kafka import TopicPartition
    >>> consumer = KafkaConsumer(bootstrap_servers='localhost:1234')
    >>> consumer.assign([TopicPartition('foobar', 2)])
    >>> msg = next(consumer)