Search code examples
pythongoogle-cloud-platformcloudgoogle-cloud-dataflowapache-beam

When using Google Cloud Dataflow flex templates, is it possible to use a multi-command CLI to run a job?


I've read through the Google Cloud Dataflow (python SDK v2.40) documents on creating a Flex Template for a single job. All examples and documentation I have found map a single python file to a single pipeline. However, I'd like to include a single python file that I can use to encapsulate multiple pipelines to allow a more modular and documentable set of pipelines. I'd like to minimize the number of separate images I create to run multiple pipelines. One approach that I would normally use is to have a multi-command command line interface like:

pipeline_script.py pipeline1

for pipeline 1.

pipeline_script.py pipeline2

for pipeline 2, and so on.

I see a single environment variable, FLEX_TEMPLATE_PYTHON_PY_OPTIONS, that might be useful, but the documentation is not clear on how it could help in my use case.

In summary, I have multiple dataflow pipelines that I'd like to run from a single Flex Template image. Any pointers?


Solution

  • I don't believe there's a hard restriction to limit a given image to a flex template. But each image maps to a given main file as mentioned here.

    So, if you are able to update your main file to run different pipelines based on metadata provided, you might be able to use the same image for multiple pipelines.