Hi friends of Apache Airflow
Is it possible to configure the number of DAG-Runs on each worker since worker concurrency only refers to tasks?
This is an example of the challenge I face:
Lets say I have a DAG called My-DAG
with 2 parallel tasks called A
and B
.
I got 12 worker VM's
for scaling things up.
Each machine can run one My-DAG (A and B in parallel) according to benchmarks.
I would configure the following:
The last bullet point shows my dilemma.
If I think about it, it could happen that one worker runs twice the task A or twice the task B.
I know a new benchmark based on tasks would make sense, however I am really interested in knowing whether this would be possible and what the best approach would be.
I got a great solution from Jens Scheffler in the Apache Airflow Slack group.
Implement two workers per VM and add a label with "queueA" and "queueB" to the workers and ensure that taks of type A are only routed to "queueA" workers and vice versa
As a result, I would need to set the worker-concurrency to 1.