Search code examples
pythongoogle-cloud-platformgoogle-cloud-dataflowapache-beamgoogle-cloud-pubsub

Dataflow Job to start based on PubSub Notification - Python


I am writing a Dataflow job which reads from BigQuery and does a few transformations.

data = (
    pipeline
    | beam.io.ReadFromBigQuery(query='''
    SELECT * FROM `bigquery-public-data.chicago_crime.crime` LIMIT 100
    ''', use_standard_sql=True)
    | beam.Map(print)
)

But my requirement is to read from BigQuery only after receiving a notification from a PubSub Topic. The above DataFlow job should start reading data from BigQuery only if the below message is received. If it is a different job id or a different status, then no action should be done.

PubSub Message : {'job_id':101, 'status': 'Success'}

Any help on this part?


Solution

  • I ended up using Cloud Functions, added the filtering logic in it and starting the Dataflow from there. Found the below link useful. How to trigger a dataflow with a cloud function? (Python SDK)