I am trying to design CDC pipeline to stream data from cloud SQL to BigQuery using DataStreams and Dataflow on GCP, the datastream part is working fine and I can see data being transferred to CloudStorage successfully in avro format.
When it comes to DataFlow, I am using DataFlow Template DataStream to BigQuery
with the configuration in the screenshot
I can see the DataFlow job started and running with no errors in the log, yet I can't see any data transfer happening from Cloud Storage to BigQuery.
It looks to me there is something missing, which is the link between Cloud storage and Pub/Sub, I think it there should be a link to stream the data from GCS to Pub/Sub, and eventually DataFlow stream from Pub/Sub to BQ, no?
What I am missing here?
It was something missing from my side which is setting up the link between GCS and Pub/Sub using the blow command
gsutil notification create -f "none" -p "db/" -t "datastream" "gs://my-buk"