I am trying to pass data from Cloud Pub/Sub to Google Cloud Storage. When I use runner DataflowRunner
, the pipeline gets published to Google Cloud Dataflow and works as expected. However, for some testing I'd like the pipeline to run locally (but still read from Cloud Pub/Sub and write Cloud Storage). When I use the runner DirectRunner, the process writes out INFO:apache_beam.runners.direct.direct_runner:Running pipeline with DirectRunner.
, but does nothing when a new message is published to Pub/Sub.
I am executing the pipeline with this command:
python dev_radim_dataflow_gcs_direct.py ^
--project=<GCP_PROJECT> ^
--region="europe-west3" ^
--input_subscription="projects/data-uat-280814/subscriptions/dev-radim-dataflow" ^
--output_path=gs://dev_radim/dataflow_dest_local/ ^
--runner=DirectRunner ^
--window_size=1 ^
--temp_location=gs://dev_radim/dataflow_temp_local/
The full dev_radim_dataflow_gcs_direct.py file is here: https://pastebin.com/W7VphH5A
Any ideas why the message doesn't make it from Pub/Sub to GCS?
Posting comment by @RadRussian as an answer, since this could happen for other people as well:
There was another consumer reading from the same subscription, so no messages ever got to the pipeline running in the DirectRunner. In this case the consumer was a Dataflow job, but it could be anything.