Search code examples
apache-kafkakafka-consumer-apikafka-python

multiprocessing in kafka-python


I have been using the python-kaka module to consume from a kafka broker. I want to consume from the same topic with 'x' number of partitions in parallel. The documentation has this :

# Use multiple consumers in parallel w/ 0.9 kafka brokers
# typically you would run each on a different server / process / CPU
 consumer1 = KafkaConsumer('my-topic',
                      group_id='my-group',
                      bootstrap_servers='my.server.com')
  consumer2 = KafkaConsumer('my-topic',
                      group_id='my-group',
                      bootstrap_servers='my.server.com')

Does this mean I can create a separate consumer for each process that I spawn? Also, will there be an overlap on the messages being consumed by consumer1 and consumer2 ?

Thanks


Solution

  • Yes, you can create multiple consumers in multiple threads/processes (and even run them in parallel on different machines). As long as all consumers use the same group.id, there will be no overlap. Kafka assigns each topic partition to a single consumer within a consumer group. Be aware, that using more consumers than available topic partitions will result in idle consumers.