Search code examples
postgresqlapache-kafkaapache-kafka-connectdebeziumcdc

Some rows in the Postgres table can generate CDC while others cannot


I have a Postgres DB with CDC setup.

I deployed the Kafka Debezium connector 1.8.0.Final for a Postgres DB by

POST http://localhost:8083/connectors

with body:

{
    "name": "postgres-kafkaconnector",
    "config": {
        "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
        "tasks.max": "1",
        "database.hostname": "example.com",
        "database.port": "5432",
        "database.dbname": "my_db",
        "database.user": "xxx",
        "database.password": "xxx",
        "database.server.name": "postgres_server",
        "table.include.list": "public.products",
        "plugin.name": "pgoutput"
    }
}

I noticed some strange things.

In same table, when I update rows, some rows can generate CDC, but other rows cannot generate CDC.

And those rows are very similar except for id and identifier are different.

-- Updating this row can generate CDC
UPDATE public.products
SET identifier = 'GET /api/accounts2'
WHERE id = '90c21719-ce41-4523-8ad1-ed6b21ecfaf1';

-- Updating this row cannot generate CDC
UPDATE public.products
SET identifier = 'GET /api/notworking/accounts2'
WHERE id = '22f5ebf3-9594-493d-8aa6-649d9fbcefd2';

I checked my Kafka Connect container log, there is no error neither.

Any idea?


Solution

  • Found the issue! It is because my Kafka Connector postgres-kafkaconnector was initially pointing to a DB (stage1), then I switched to another DB (stage2) by updating

    "database.hostname": "example.com",
    "database.port": "5432",
    "database.dbname": "my_db",
    "database.user": "xxx",
    "database.password": "xxx",
    

    However, they are using same configuration properties in the Kafka Connect I deployed in the very beginning:

    config.storage.topic
    offset.storage.topic
    status.storage.topic
    

    Since this connector with different DB config shared same above Kafka configuration properties, nd the database table schemas are same,

    it became mess due to sharing same Kafka offset.

    One simple way to fix is when deploying Kafka connector to test on different DBs, using different names such as postgres-kafkaconnector-stage1 and postgres-kafkaconnector-stage2 to avoid Kafka topic offset mess.