Search code examples
apache-kafkakafka-consumer-api

Impact of max.poll.records on the network call being made between consumer and broker


Kafka documentation of max.poll.records states

The maximum number of records returned in a single call to poll(). Note, that max.poll.records does not impact the underlying fetching behavior. The consumer will cache the records from each fetch request and returns them incrementally from each poll.

In many places (like here) it is stated that setting max.poll.records to some high value reduces the network calls being made to the broker. Can someone explain to me how setting max.poll.records to some higher value(e.g. the default 500) as opposed to 1 may reduce the network calls being made if the fetching behaviour is independent of this setting?

If the processing code in a consumer is processing each record one after another (to maintain the ordering guarantee of all records in a partition), does it even have any advantage of setting the value of max.poll.records to something higher than 1?


Solution

  • The Kafka documentation is correct. max.poll.records doesn't affect the frequency of network calls.

    In a nutshell, KafkaConsumer's poll works like below (refs):

      1. collect records from internal buffer up to max.poll.records
      1. send next fetches for partitions which have no in-flight request and no pending records remained
      1. return collected records

    Regardless we set max.poll.records to high or low, fetch requests are sent only when previously-fetched records are processed.