Search code examples
amazon-redshiftapache-kafka-connectdebezium

Debezium connector for Redshift based on the existing postgresql one


I have been successfully used the Postgresql Debezium plugin for Kafka connect. This connector hooks up directly to the relational database's Write Ahead Log(WAL) which vastly improves performance, compared to the normal JDBC connector, which continuously polls the database via an sql query.

Is something similar possible with Redshift as the source, instead of Postgresql? I know there are major differences between Redshift and Postgresql, in that Redshift is columnar based, cluster-based, doesn't have secondary indexes, and it has different use cases. I could not find definitive information if Redshift has anything similar with a write ahead log or it uses a completely different approach.

Is there a write-ahead-log based approach to stream data changes from a redshift table directly into kafka, via debezium or some other way, or its not technically possible? If not, what about some alternative which achieves the same thing?


Solution

  • To answer your question in one line - No, it's not supported, and Im sure that AWS(or any modern DW) will never even think of enabling this feature.

    Here are two strong reasons from my point of view:

    • RedShift itself getting the data from a different database(like your Postgres) and the main purpose is Read not write(so less concurrent writes).
    • For analytical purposes we bring all the data into a DW. From there it'll go to BI tool or any ML related works. But I never have seen any place where DW data will go to another database in Real or near realtime.

    (You might already know this option)If you still need to do this, then you are getting the data from some sources, right? use that same source to send the data where you want to use from RedShift CDC.