Search code examples
google-cloud-dataflowapache-beamapache-beam-ioapache-beam-internals

Can Apache Beam Pipeline be used for batch orchestration?


I am newbie in apache beam environment. Trying to fit apache beam pipeline for batch orchestration.

My definition of batch is as follows

Batch==> a set of jobs,
Job==> can have one or more sub-job.

There can be dependencies between jobs/sub-jobs.

Can apache beam pipeline be mapped with my custom batch??


Solution

  • Apache Beam is unified for developing both batch and stream pipelines which can be run on Dataflow. You can create and deploy your pipeline using Dataflow. Beam Pipelines are portable so that you can use any of the runners available according to your requirement.

    Cloud Composer can be used for batch orchestration as per your requirement. Cloud Composer is built on Apache Airflow. Both Apache Beam and Apache Airflow can be used together since Apache Airflow can be used to trigger the Beam jobs. Since you have custom jobs running, you can configure the beam and airflow for batch orchestration.

    Airflow is meant to perform orchestration and also pipeline dependency management while Beam is used to build data pipelines which are executed data processing systems.