Search code examples
mysqlinfinite-loopchange-data-capture

Prevent rows from being read by a CDC program for MySQL to avoid redundant data


I'm connecting the database of a legacy application to another database by using a CDC tool (I'm using Zendesk's Maxwell) to read data from the database, and another program I'm writing to write data into the database that originated elsewhere.

The problem is that when the foreign data is written into the legacy database, the CDC tool will pick it up. I want to prevent this, since that foreign data already exists in the other system.

I thought about adding a column to all the tables in the legacy database, called _origin for example, and putting the origin of each row in that column. The issue with that is that the legacy application does a lot of updates, so the CDC tool would miss real updates since, the _origin column wouldn't change.

Is there a way to somehow write metadata into the MySQL binlog to indicate the origin of this specific transaction? I would have to figure out how to read it using the CDC tool or modify a CDC tool to read such metadata, but I want to see if it's even possible.

Or, is there a better way to do this?


Solution

  • No, there's no way to write your own custom metadata into the binlog. Just the data itself, and certain session variables.

    One solution could be when you read data from the CDC to insert into your destination database, use REPLACE instead of INSERT. The syntax is the same, but it overwrites instead of appends if the row already exists.