Search code examples
airflowairflow-scheduler

Airflow delay between catchup instances


I have the below dag settings to run catchup from 2015. For each execution date the task instance completes in under a minute. However, the next day's task starts only in 5 minute windows. E.g. 10:00 AM, 10:05 AM, 10:10 AM etc. I do not see a 5 minute interval specified for task instances. How can I modify the dag to trigger as soon as the previous instance finishes? I'm using Airflow Version 1.9.0

default_args = {
   'owner': 'ssnehalatha',
   'email': ['[email protected]'],
   'depends_on_past': False,
   'start_date': datetime(2015, 1, 1),
   'on_failure_callback': jira_failure_ticket,
   'trigger_rule': 'all_done',
   'retries': 1,
   'pool': 'python_sql_pool'
}

dag = DAG('daily_dag',
           schedule_interval='15 1 * * 0,1,2,3,4,5',
           default_args=default_args,
           dagrun_timeout=timedelta(hours=24),
           catchup=True)

Solution

  • If I am not mistaken, this is connected to the scheduler settings in airflow.cfg.

    [scheduler]
    
    # The scheduler constantly tries to trigger new tasks (look at the
    # scheduler section in the docs for more information). This defines
    # how often the scheduler should run (in seconds).
    scheduler_heartbeat_sec = 60
    

    EDIT

    The docs for the two parameters you mentioned (from https://github.com/apache/incubator-airflow/blob/master/UPDATING.md):

    min_file_process_interval After how much time should an updated DAG be picked up from the filesystem.

    dag_dir_list_interval The frequency with which the scheduler should relist the contents of the DAG directory. If while developing +dags, they are not being picked up, have a look at this number and decrease it when necessary.

    Seems to me they are more for detecting changed and new DAG files as opposed to executing tasks.