I am writing a Dataflow job which reads from BigQuery and does a few transformations.
data = (
pipeline
| beam.io.ReadFromBigQuery(query='''
SELECT * FROM `bigquery-public-data.chicago_crime.crime` LIMIT 100
''', use_standard_sql=True)
| beam.Map(print)
)
But my requirement is to read from BigQuery only after receiving a notification from a PubSub Topic. The above DataFlow job should start reading data from BigQuery only if the below message is received. If it is a different job id or a different status, then no action should be done.
PubSub Message : {'job_id':101, 'status': 'Success'}
Any help on this part?
I ended up using Cloud Functions, added the filtering logic in it and starting the Dataflow from there. Found the below link useful. How to trigger a dataflow with a cloud function? (Python SDK)