Search code examples
google-cloud-platformgoogle-bigquerygoogle-cloud-dataflowcdcgoogle-datastream

Google Cloud DataStream to Bigquery template not able to sync data to big query


I am trying to design CDC pipeline to stream data from cloud SQL to BigQuery using DataStreams and Dataflow on GCP, the datastream part is working fine and I can see data being transferred to CloudStorage successfully in avro format.

When it comes to DataFlow, I am using DataFlow Template DataStream to BigQuery with the configuration in the screenshot

I can see the DataFlow job started and running with no errors in the log, yet I can't see any data transfer happening from Cloud Storage to BigQuery.

It looks to me there is something missing, which is the link between Cloud storage and Pub/Sub, I think it there should be a link to stream the data from GCS to Pub/Sub, and eventually DataFlow stream from Pub/Sub to BQ, no?

What I am missing here?

enter image description here


Solution

  • It was something missing from my side which is setting up the link between GCS and Pub/Sub using the blow command

    gsutil notification create -f "none" -p "db/" -t "datastream" "gs://my-buk"