Search code examples
kubernetesapache-kafkaapache-kafka-connectstrimzi

Exotic choice for Kafka-Connect Deployment


I currently own a Kafka-Connect cluster consisting of two workers deployed across 2 EC2 instances. I create connectors via a client layer relying on the kafka-connect API - i also version them that way, aka I store the configs in json files and deploy them by using them in API requests.

Currently I am trying to solve two problems:

  • Scalability: With the increasing number of connectors my users create and the variety of throughput and records size, I need to be constantly behind them monitoring the heap size resulting in the new connectors addition and so on.

  • UX: Another team uses Strimzi Operator to have their users create topics, by simply adding manifests inside their own repositories. I want to join my KC client to this way of doing things so that the users can create a topic and a connector in the same place and also remove the need to rewrite all the boilerplate that a connectors config requires.

I am thinking of migrating to Kubernetes, however, migrating the servers to pods or even adding an autoscaler will not solve much. I would like to think about an architecture where a request for a connector means the deployment of a dedicated, isolated Kafka Connect worker with a different group id. That way, connectors do not share JVM resources.

I didn't find such architecture proposed elsewhere and tbh, I have my doubts especially that this goes against the definition of kafka connect concepts themselves.

Deploying Kafka Connect connectors in Kubernetes

is this supposed to make it an SW question...


Solution

  • The only real concept of Kafka Connect is that of workers and tasks; there's no recommended deployment architecture.

    Connectors will always share JVM resources of the cluster, unless you form a model where each connector is a dedicated connect "cluster". This can easily be accomplished with containers, such as ECS/Fargate or Kubernetes (EKS). EC2 is too rigid without an ASG, but doesn't provide resource isolation, as you're asking for.

    Strimzi can do that, but scaling is a unique problem that you cannot scale beyond topic partition count for sink consumers and source consumers sometimes cannot scale much at all (Debezium or JDBC source should always have one task per table)

    The Kafka Connect CRD in Strimzi allows you to skip the JSON files, as it'll deploy and manage the connector tasks on its own