Search code examples
google-cloud-platformgoogle-bigquerygoogle-cloud-dataflowgoogle-cloud-spanner

Dataflow job is not picking up on new Spanner change stream tables for replication


I have a change stream that's watching all my Spanner tables. A dataflow job is listening to the change stream, and writes my data into BigQuery.

However, when adding new tables to Spanner, the Dataflow pipeline seems to not pick up on the new watched table.

I am using the default (off the shelf) Spanner to BigQuery - Dataflow job [--template-file-gcs-location=gs://dataflow-templates-us-central1/2023-08-01-00_RC00/flex/Spanner_Change_Streams_to_BigQuery]

What I've tried

  • I made sure the Change stream is capturing all, by re-running ALTER CHANGE STREAM AllStream SET FOR ALL; successfully.
  • I've restarted my dataflow job (canceled old one, and started a fresh one)

However, my new table is not being replicated (not showing up in my BigQuery dataset). Any suggestion on what else I could try?


Solution

  • That is an excellent point you are making and your observations are corerct. The Spanner change streams to BigQuery Dataflow template does not apply schema changes. Instead, the recommended workflow is to:

    1. Cancel the Dataflow pipeline and note the low watermark
    2. Apply the changes in BigQuery
    3. Apply the changes in Spanner
    4. Create a new pipeline with the same configuration as the original one and the previous low watermark as start timestamp

    Let me know if that worked for you!