Search code examples
apache-kafkakafka-consumer-apiapache-kafka-streams

Kafka Consumer Fairness when fetching records from its assigned partitions


Consider a kafka topic deployment with 3 partitions P1, P2, P3 with events/records lagging in the partitions equal to 100, 50, 75 for P1, P2, P3 respectively. And let’s suppose that num.poll.records (the maximum number of records that can be fetched from the broker ) is equal to 100.

If a consumer sends a request to fetch records from P1, P2, P3, is there any guarantee that the returned records will be fairly/uniformly selected out of the available partitions e.g., say 34 records from P1, 33 from P2 and 33 from P3.

Otherwise, how the decision on the returned records is handled (e.g., is it based on the first partition leader that replies to the fetch request, the most lagging partition etc..). In such case how fetch fairness is guaranteed across different partitions, especially to handle the case, for example, when records end up fetched/read from a single partition out of the set of partitions assigned to a the consumer.

Thank you.


Solution

  • Thanks. Still, as per the documentation, it looks like on the next poll/iteration the poll will fetch out of P2 (regardless of wether new messages are written to P1).
    kafka consumer fairness

    As before, we'd keep track of which partition we left off at so that the next iteration would begin there. This would not achieve the ideal balancing described above, but it still ensures that each partition gets consumed and requires only a single pass over the fetched records.

    So its kind of inter-fetch/poll fairness". Otherwise, it could happen that few partition in the topic will exhibit starvation indefinitely.