We have a managed MSK kafka cluster of 3 brokers. We have noticed that one of the brokers has a much larger disk usage than others for quite some time.
After some analysis we found out that one of the partition replicas in the __consumer_offsets
topic is much larger than the other replicas.
The size of the partition on the primary broker and one of the replicas is roughly 40MB. The third replica is 150GB. We do have a few consumer groups which are very active (billions of messages a day), but I would expect the replicas to even out, especially if the primary replica is not very big.
We think that it happened because at some point we had a disk at 100% in all the brokers. We increased the disk and restarted all the brokers.
Since it's AWS MSK we don't have access to the servers. Any suggestions of why it could happen and how to solve the issue?
After fiddling around I found a solution. The idea is to recreate the problematic replica.
The steps:
./kafka-reassign-partitions.sh \
--generate --broker-list "1,2,3" \
--topics-to-move-json-file ./topics-to-generate.json \
--bootstrap-server <PUT_BOOTSTRAP_SERVERS_HERE>
Where the topics-to-generate.json
file is:
{
"topics": [
{"topic": "__consumer_offsets"}
],
"version":1
}
...
{
"topic": "__consumer_offsets",
"partition": 12,
"replicas": [3,1,2],
"log_dirs": ["any","any","any"]
},
{
"topic": "__consumer_offsets",
"partition": 13,
"replicas": [1,2],
"log_dirs": ["any","any"]
},
{
"topic": "__consumer_offsets",
"partition": 14,
"replicas": [2,3,1],
"log_dirs": ["any","any","any"]
},
...
./kafka-reassign-partitions.sh \
--reassignment-json-file ./new-partitions-assignment.json \
--execute \
--bootstrap-server <PUT_BOOTSTRAP_SERVERS_HERE>
(after you execute, you can view the status by replacing the --execute
with --verify
)
./kafka-reassign-partitions.sh \
--reassignment-json-file ./original-partitions-assignment.json \
--execute \
--bootstrap-server <PUT_BOOTSTRAP_SERVERS_HERE>