Search code examples
apache-sparkpysparkdatabricksazure-databricksspark-structured-streaming

Spark structured streaming from JDBC source


Can someone let me know if its possible to to Spark structured streaming from a JDBC source? E.g SQL DB or any RDBMS.

I have looked at a few similar questions on SO, e.g

Spark streaming jdbc read the stream as and when data comes - Data source jdbc does not support streamed reading

jdbc source and spark structured streaming

However, I would like to know if its officially supported on Apache Spark?

If there is any sample code that would be helpful.

Thanks


Solution

  • I am on a project now architecting this using CDC Shareplex from ORACLE and writing to KAFKA and then using Spark Structured Streaming with KAFKA integration and MERGE on delta format on HDFS.

    Ie that is the way to do it if not using Debezium. You can use change logs for base tables or materialized views to feed CDC.

    So direct JDBC is not possible.