Search code examples
apache-kafkaapache-kafka-connectstrimzi

How many kafka connect clusters are optimal?


I use Kafka with Strimzi operator and I am wondering if there is a optimal number of kafka connect clusters I should make. I have a medium sized infrastructure with around 50 different business logic use cases so I created 50 kafka clusters - one for each.

Now this naming is a bit strange, but in Strimzi "cluster" is actually of type KafkaConnect.

The reasons for creating more "clusters" are:

  • these kafka connects have group.id so it's easier long term if I just have 1 cluster for each use case.
  • security, because each cluster has its own user.

Is this approach wrong or is it OK?


Solution

  • I do not think there is a simple yes or no answer to this. Each mode has some pros and some cons.

    Running separate Connect clusters for each project has a lot of advantages:

    • Better isolation of resources: you can say how much CPU and memory you give to the Connect cluster used by this project. With one shared cluster, you cannot control how much CPU or memory will each project get.
    • Better security isolation: The various user credentials used by one project will not be easily accessible by other projects.
    • Less interdependencies between the projects: With each project having their own Connect cluster, it is much easier for them to for example use different versions of the same connector. With one big cluster, they would all need to use the same version of any connector.
    • Less disruption: A rolling update needed for one project (e.g. to add a new connector plugin) would not affect all the other projects.

    But the shared cluster has also some advantages:

    • Smaller running costs: With many separate Connect clusters, the sum of all resources used by them will be likey higher than with one big cluster because of the overhead per-Connect-cluster as well as the JVM overhead.
    • Easier operations: Strimzi should make it easier for you to manage the clusters. But it might still be easier to monitor one big cluster than 50 small clusters.
    • Less overhead in the Kafka brokers: As each Connect cluster will have its own topics and partitions, you will have 50-times more topics and partitions in your Kafka cluster compared to having one big cluster just to run the Connects. So this will put some increased load on the Kafka brokers as well.

    But I'm afraid that at the end, it is up to you to pick which mode you prefer. There are many users who love that they can save the resources with one big Connect cluster. But there are also many users who actually want to have a separate cluster for each Connector or each project as well. So it depends on what is more important for you from all the pros and cons.