Search code examples
apache-kafkaapache-kafka-connectconfluent-kafka-dotnet

Custom Connector for Apache Kafka


I am looking to write a custom connector for Apache Kafka to connect to SQL database to get CDC data. I would like to write a custom connector so I can connect to multiple databases using one connector because all the marketplace connectors only offer one database per connector.

First question: Is it possible to connect to multiple databases using one custom connector? Also, in that custom connector, can I define which topics the data should go to?

Second question: Can I write a custom connector in .NET or it has to be Java? Is there an example that I can look at for custom connector for CDC for a database in .net?


Solution

  • There are no .NET examples. The Kafka Connect API is Java only, and not specific to Confluent.

    Source is here - https://github.com/apache/kafka/tree/trunk/connect

    Dependency here - https://search.maven.org/artifact/org.apache.kafka/connect-api

    looking to write a custom connector ... to connect to SQL database to get CDC data

    You could extend or contribute to Debezium, if you really wanted this feature.

    connect to multiple databases using one custom connector

    If you mean database servers, then not really, no. Your URL would have to be unique per connector task, and there isn't an API to map a task number to a config value. If you mean one server, and multiple database schemas, then I also don't think that is really possible to properly "distribute" within a single connector with multiple tasks (thus why database.names config in Debezium only currently supports one name).

    explored debezium but it won't work for us because we have microservices architecture and we have more than 1000 databases for many clients and debezium creates one topic for each table which means it is going to be a massive architecture

    Kafka can handle thousands of topics fine. If you run the connector processes in Kubernetes, as an example, then they're centrally deployable, scalable, and configurable from there.

    However, I still have concerns over you needing all databases to capture CDC events.

    Was also previously suggested to use Maxwell