Search code examples
mongodbapache-kafkaapache-kafka-connectchangestreammongodb-kafka-connector

Distributed Official Mongodb Kafka Source Connector with Multiple tasks Not working


I am running Apache Kafka on my Windows machine with two Kafka-Connect-Workers(Port 8083, 8084) and one topic with three partitions(replication of one). My issue is that I am able to see the fail-over to other Kafka-Connect worker whenever I shutdown one of them, but load balancing is not happening because the number of tasks is always ONE. I am using Official MongoDB-Kafka-Connector as Source(ChangeStream) with tasks.max=6. I tried updating MongoDB with multiple threads so that it could push more data into Kafka-Connect and may perhaps make Kafka-Connect create more tasks. Even under higher volume of data, tasks count remain one.

How I confirmed only one task is running? That's through the api "http://localhost:8083/connectors/mongodb-connector/status" : Response: { "name":"mongodb-connector", "connector": { "state":"RUNNING", "worker_id":"xx.xx.xx.xx:8083" } "tasks": [ { "id": 0, "state": "RUNNING" "worker_id": "xx.xx.xx.xx:8083" } ], "type": "source" } Am I missing something here? Why more tasks are not created?


Solution

  • It seems this is the behavior of Official MongoDB Kafka Source Connector. This is the answer I got on another forum from Ross Lawley(MongoDB developer):

    Prior to 1.2.0 only a single task was supported by the sink connector. The Source connector still only supports a single task, this is because it uses a single Change Stream cursor. This is enough to watch and publish changes cluster wide, database wide or down to a single collection.

    I raised this ticket: https://jira.mongodb.org/browse/KAFKA-121 Got following response: The source connector will only ever produce a single task. This is by design as the source connector is backed by a change stream. Change streams internally use the same data as used by replication engine and as such should be able to scale as the database does. There are no plans to allow multiple cursors, however, should you feel that this is not meeting your requirements, then you can configure multiple connectors and each would have its own change stream cursor.