Search code examples
airflowairflow-schedulerairflow-2.x

Does Airflow with multiple Schedulers and LocalExecutor provide a complete solution for horizontal scale up?


Are airflow tasks processed by local workers that reside in the Scheduler component when using LocalExecutor? So if in practice, it's possible to set Schedulers to perform tasks by their own local workers, does this mean, that setting up Airflow with multiple Schedulers provides a solution for horizontal scale up even with LocalExecutor?


Solution

  • Using Airflow's LocalExecutor, the scheduler process spawns (or forks) subprocesses that perform tasks, by default at most 32, configured by AIRFLOW__CORE__PARALLELISM.

    Before it was possible to have multiple schedulers, the LocalExecutor was really meant for running on a single machine. With multiple schedulers, it is indeed possible to scale horizontally. However, scaling out the "brains" rather than (or more precisely -- together with) the "workers" is something you don't see much, in any distributed system.

    I don't have concrete numbers to back up how this performs for Airflow, but for example Astronomer limits the number of schedulers to 4, GCP Composer to 10, and AWS MWAA to 5. You can scale the number of workers on all those services to more than that. When you need horizontal scalability, I'd choose the CeleryExecutor or KubernetesExecutor instead, which were developed with horizontal scalability in mind from the start and can run and scale out tasks on a machine separate from the scheduler process.

    Note: The term "workers" only applies to the CeleryExecutor, where you start separate "worker" processes that perform tasks.