How to handle large messages in Kafka like more than 20MB etc.
[2019-03-13 08:59:10,923] ERROR Error when sending message to topic test with key: 13 bytes, value: 11947696 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
[2019-03-13 03:59:14,478] ERROR Error when sending message to topic test with key: 13 bytes, value: 11947696 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.RecordTooLargeException: The message is 11947797 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.
We need to set following configurations
Broker
replica.fetch.max.bytes: Changes to this property will allow for the replicas in the brokers to send messages within the cluster and make sure the messages are replicated correctly. If this is too small, then the message will never be replicated, and therefore, the consumer will never see the message because the message will never be committed (fully replicated).
message.max.bytes: This is the largest size of the message that can be received by the broker from a producer.
Broker (topic)
max.message.bytes: The largest record batch size allowed by Kafka. If this is increased and there are consumers older than 0.10.2, the consumers' fetch size must also be increased so that the they can fetch record batches this large. In the latest message format version, records are always grouped into batches for efficiency. In previous message format versions, uncompressed records are not grouped into batches and this limit only applies to a single record in that case (Defaults to broker's message.max.bytes).
Producer
max.request.size: The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum record batch size. Note that the server has its own cap on record batch size which may be different from this.
compression.type: Set to snappy, this will increase the total amount of data which can sent with a single request and should be paired with a larger batch.size.
buffer.memory: If compression is enabled the buffer size should be raised as well.
batch.size: Batch size should be at least 10s of KB, diminishing returns can be seen at around 300kb(less for remote client). Larger batches result in a better compression ratio as well.
linger.ms: linger.ms preempts any bounds that were placed on batch size. Increase this value to ensure smaller batches are not sent during slower production times
Consumer
fetch.message.max.bytes: This will determine the largest size of a message that can be fetched by the consumer.
max.partition.fetch.bytes: The maximum amount of data per-partition the server will return.