Kafka documentation of max.poll.records
states
The maximum number of records returned in a single call to
poll()
. Note, thatmax.poll.records
does not impact the underlying fetching behavior. The consumer will cache the records from each fetch request and returns them incrementally from each poll.
In many places (like here) it is stated that setting max.poll.records
to some high value reduces the network calls being made to the broker. Can someone explain to me how setting max.poll.records
to some higher value(e.g. the default 500) as opposed to 1 may reduce the network calls being made if the fetching behaviour is independent of this setting?
If the processing code in a consumer is processing each record one after another (to maintain the ordering guarantee of all records in a partition), does it even have any advantage of setting the value of max.poll.records
to something higher than 1?
The Kafka documentation is correct. max.poll.records
doesn't affect the frequency of network calls.
In a nutshell, KafkaConsumer's poll works like below (refs):
max.poll.records
Regardless we set max.poll.records
to high or low, fetch requests are sent only when previously-fetched records are processed.