Search code examples
javaapache-kafkakafka-consumer-apiproducer-consumerconsumer

How can I read faster from Kafka


I created a new Kafka server (I created 1 broker with 1 partition) and I succeeded to produce and consume from this server using java code, but Im not satisfied from the amount of events that I'm reading per second as a consumer.

I have already played with the following consumer setting:

AUTO_OFFSET_RESET_CONFIG = "earliest"
FETCH_MAX_BYTES_CONFIG = 52428800
MAX_PARTITION_FETCH_BYTES_CONFIG = 1048576
MAX_POLL_RECORDS_CONFIG = 10000
pollDuration = 3000

But no matter what I entered as a value to each one of the setting, the result stayed the same

Currently, I produced 100,000 messages to Kafka. each message size is 2 kilobytes and it took 20669 milliseconds or 20 seconds (total time) to read all batches of 100000 records, which means 5000 records per second.

I expect it to be much higher, what are the most ideal values ​​that I can set or maybe I need to use other setting or maybe I need to set my Kafka server otherwise (multiple brokers or partitions)?


Solution

  • Apart from the settings you mentioned and ignoring horizontal scaling/partitioning:

    if you are not using compression, do it!

    From the wiki:

    If enabled, data will be compressed by the producer, written in compressed format on the server and decompressed by the consumer.

    lz4 compression type proved to be a good one in my experience, sample settings for the producer:

    compression.type = lz4
    batch.size = 131072
    linger.ms = 10
    

    That means less data has to be transmitted in the network and on the other hand more cpu usage for compression/decompression.

    you can find more info related to the batch and linger time in this other answer I gave related to timeouts, however it is focused on the producer part.