Search code examples
concurrencyairflowdirected-acyclic-graphs

DAG-Concurrency per Worker in Apache Airflow


Hi friends of Apache Airflow

Is it possible to configure the number of DAG-Runs on each worker since worker concurrency only refers to tasks?

This is an example of the challenge I face:

Lets say I have a DAG called My-DAG with 2 parallel tasks called A and B.
I got 12 worker VM's for scaling things up.
Each machine can run one My-DAG (A and B in parallel) according to benchmarks.

I would configure the following:

  • parallelism = 32 (since thats enough for this example)
  • max_active_runs/max_active_runs_per_dag = 12 (1 per worker)
  • max_active_tasks_per_dag = 24 (12 DAG runs times 2 parallel tasks A and B)
  • worker-concurrency = 2 (2 tasks per worker with the assumption that one DAG runs per worker)

The last bullet point shows my dilemma.
If I think about it, it could happen that one worker runs twice the task A or twice the task B.

I know a new benchmark based on tasks would make sense, however I am really interested in knowing whether this would be possible and what the best approach would be.


Solution

  • I got a great solution from Jens Scheffler in the Apache Airflow Slack group.

    Implement two workers per VM and add a label with "queueA" and "queueB" to the workers and ensure that taks of type A are only routed to "queueA" workers and vice versa

    As a result, I would need to set the worker-concurrency to 1.