Search code examples
apache-kafkaapache-kafka-connectdebeziummongodb-kafka-connector

Kafka-MongoDB Debezium Connector : distributed mode


I am working on debezium mongodb source connector. Can I run connector in local machine in distributed mode by giving kafka bootstrap server address as remote machine (deployed in Kubernetes) and remote MongoDB url?

I tried this and I see connector starts successfully, no errors, just few warnings but no data is flowing from mongodb.

Using below command to run connector

./bin/connect-distributed ./etc/schema-registry/connect-avro-distributed.properties ./etc/kafka/connect-mongodb-source.properties

If not how else can I achieve this, I donot want to install local kafka or mondoDB as most of the tutorial suggest. I want to use our test servers for this.

Followed below tutorial for this : https://medium.com/tech-that-works/cloud-kafka-connector-for-mongodb-source-8b525b779772

Below are more details for the issue Connector works fine, I see below lines at the end of connector log

 INFO [Worker clientId=connect-1, groupId=connect-cluster] Starting connectors and tasks using config offset -1 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1000)
] INFO [Worker clientId=connect-1, groupId=connect-cluster] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1021)

I have also defined MongoDB config in /etc/kafka/connect-mongodb-source.properties as follows

name=mongodb-source-connector 
connector.class=io.debezium.connector.mongodb.MongoDbConnector 
mongodb.hosts=/remoteserveraddress:27017 
mongodb.name=mongo_conn 
initial.sync.max.threads=1 
tasks.max=1

But Data is not flowing between MongoDB and Kafka. I have also posted saperate question for this Kafka-MongoDB Debezium Connector : distributed mode

Any pointers are appriciated


Solution

  • connect-distributed only accepts a single property file.

    You must use the REST API to configure Kafka Connect in Distributed mode.

    https://docs.confluent.io/current/connect/references/restapi.html

    Note: by default, the consumer will read the latest data off the topic, not existing data.

    You would add this to the connect-avro-distributed.properties to fix it

    consumer.auto.offset.reset=earliest