We are using Airflow v 1.9.0. We have 100+ dags and the instance is really slow. The scheduler is only launching some tasks.
In order to reduce the amount of CPU usage, we want to tweak some configuration parameters, namely: min_file_process_interval
and dag_dir_list_interval
. The documentation is not really clear about the difference between the two
min_file_process_interval
:
In cases where there are only a small number of DAG definition files, the loop could potentially process the DAG definition files many times a minute. To control the rate of DAG file processing, the
min_file_process_interval
can be set to a higher value. This parameter ensures that a DAG definition file is not processed more often than once everymin_file_process_interval
seconds.
dag_dir_list_interval
:
Since the scheduler can run indefinitely, it's necessary to periodically refresh the list of files in the DAG definition directory. The refresh interval is controlled with the
dag_dir_list_interval
configuration parameter.
Source: A Google search on both terms lead to this first result https://cwiki.apache.org/confluence/display/AIRFLOW/Scheduler+Basics