Kafka partition huge replica size

We have a managed MSK kafka cluster of 3 brokers. We have noticed that one of the brokers has a much larger disk usage than others for quite some time.

After some analysis we found out that one of the partition replicas in the __consumer_offsets topic is much larger than the other replicas. The size of the partition on the primary broker and one of the replicas is roughly 40MB. The third replica is 150GB. We do have a few consumer groups which are very active (billions of messages a day), but I would expect the replicas to even out, especially if the primary replica is not very big.

We think that it happened because at some point we had a disk at 100% in all the brokers. We increased the disk and restarted all the brokers.

Since it's AWS MSK we don't have access to the servers. Any suggestions of why it could happen and how to solve the issue?

Solution

After fiddling around I found a solution. The idea is to recreate the problematic replica.

The steps:

get existing partitions assignment using:

./kafka-reassign-partitions.sh \
  --generate --broker-list "1,2,3" \
  --topics-to-move-json-file ./topics-to-generate.json \
  --bootstrap-server <PUT_BOOTSTRAP_SERVERS_HERE>

Where the topics-to-generate.json file is:

{
  "topics": [
    {"topic": "__consumer_offsets"}
  ],
  "version":1
}

Using the output from step 1, update the problematic partition by removing the problematic replica and it's log dir (in our case it was partition 13 on broker 3):

...
        {
            "topic": "__consumer_offsets",
            "partition": 12,
            "replicas": [3,1,2],
            "log_dirs": ["any","any","any"]
        },
        {
            "topic": "__consumer_offsets",
            "partition": 13,
            "replicas": [1,2],
            "log_dirs": ["any","any"]
        },
        {
            "topic": "__consumer_offsets",
            "partition": 14,
            "replicas": [2,3,1],
            "log_dirs": ["any","any","any"]
        },
...

Apply the new configuration:

./kafka-reassign-partitions.sh \
  --reassignment-json-file ./new-partitions-assignment.json \
  --execute \
  --bootstrap-server <PUT_BOOTSTRAP_SERVERS_HERE>

(after you execute, you can view the status by replacing the --execute with --verify)

Wait for the disk size to go down
Restore the original partitions assignment using the original config from step 1:

./kafka-reassign-partitions.sh \
  --reassignment-json-file ./original-partitions-assignment.json \
  --execute \
  --bootstrap-server <PUT_BOOTSTRAP_SERVERS_HERE>