Search code examples
apache-kafkakafka-producer-api

Kafka partition offsets are unevenly distributed by significant amount


I have a scenario in one of our testing environments after a load test where our kafka partition's offsets were unevenly distributed. If it was off by a couple of hundreds, I'd consider that to be normal, but this seems to be different.

Amongst 10 partitions, I am seeing the following distribution across our partitions:

-------------------------
|partition  |     offset|
-------------------------
|0          |    100000+|
-------------------------
|1          |       ~200|
-------------------------
|2 - 10     |        ~50|
-------------------------
...

The load test generated unique keys and assigned them to the events that were generated. According to the kafka docs, as long as the keys are not identical they should randomly pick a partition. It seems odd to me that the offset is so high for the first partition and was wondering if anyone has any knowledge as to why this occurs?

This does not seem to occur to such a degree in normal circumstances, only when a load test is performed.

[Edit]: Only producer configs are related to SSL settings. Everything else is default. The key is generated using uuid/v4 during the load test.

{
  host: process.env.KAFKA_URL,
  requestTimeout: 1000,
  ssl: true,
  sslOptions: config.sslOptions
}


Solution

  • According to the kafka docs, as long as the keys are not identical they should randomly pick a partition.

    The logic for the DefaultPartitioner class is more something like

    hash(key) % numberOfPartitions
    

    as the code is written.

    It looks like your keys mostly fall into partition 0 and might be worth re-considering key creation and/or selection of another partitioner strategy.

    In case you really want to have partition selected round-robin you can use null keys.