Search code examples
airflow

What is the difference between min_file_process_interval and dag_dir_list_interval in Apache Airflow 1.9.0?


We are using Airflow v 1.9.0. We have 100+ dags and the instance is really slow. The scheduler is only launching some tasks.

In order to reduce the amount of CPU usage, we want to tweak some configuration parameters, namely: min_file_process_interval and dag_dir_list_interval. The documentation is not really clear about the difference between the two


Solution

  • min_file_process_interval:

    In cases where there are only a small number of DAG definition files, the loop could potentially process the DAG definition files many times a minute. To control the rate of DAG file processing, the min_file_process_interval can be set to a higher value. This parameter ensures that a DAG definition file is not processed more often than once every min_file_process_interval seconds.

    dag_dir_list_interval:

    Since the scheduler can run indefinitely, it's necessary to periodically refresh the list of files in the DAG definition directory. The refresh interval is controlled with the dag_dir_list_interval configuration parameter.

    Source: A Google search on both terms lead to this first result https://cwiki.apache.org/confluence/display/AIRFLOW/Scheduler+Basics