Search code examples
apache-flinkflink-streamingcdcflink-sqlflink-table-api

Difference between Flink mysql and mysql-cdc connector?


In order to enrich the data stream, we are planning to connect the MySQL (MemSQL) server to our existing flink streaming application

As we can see that Flink provides a Table API with JDBC connector https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/

Additionally, I discovered another MySQL connector called Flink-CDC https://ververica.github.io/flink-cdc-connectors/master/content/about.html allowing to work with external database in a stream fashion

what is the difference between them? what is better to choose in my case?


Solution

  • Change Data Capture (CDC) connectors capture all changes that are happening in one or more tables. The schema usually has a before and an after record. The Flink CDC connectors can be used directly in Flink in an unbounded mode (streaming), without the need for something like Kafka in the middle.

    The normal JDBC connector can used in bounded mode and as a lookup table.

    If you're looking to enrich you existing stream, you most likely want to use the lookup functionality. That allows you to query a table for a specific key (coming from your stream) and enrich the stream with data from your table. Keep in mind that from a performance perspective you're best off to use a temporal table join. See the example in https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/jdbc/#how-to-create-a-jdbc-table