A source table in an SQL DB increments (new rows) every second.
I want to run some spark code (maybe with Structured Streaming?) once per day (it is okay if the copy is at most 1-day outdated), to append the new rows since the last time I ran the code. The copy would be a delta table on Databricks.
I'm not sure spark.readStream
will work since the source table is not delta, rather JDBC (SQL)
Structured Streaming doesn't support JDBC source: link
If you have a strictly increasing column in your source table, you can read it in batch mode and store your progress in the userMetadata
in your target Delta table link.