Search code examples
pythonworkflowairflow

understanding the tree view in apache airflow


I setup the dag from the https://airflow.apache.org/tutorial.html as is, the only change being that I have set the dag to run at an interval of 5 minutes with a start date of 2017-12-17 T13:40:00 UTC. I enabled the dag before 13:40, so there was no backfill and my machine is running on UTC. The dag ran as expected(i.e at an interval of 5 minutes starting at 13:45 UTC)

Now, when I go to the tree view, I am failing to understand the graph. There are 3 tasks in total. 'sleep'(t2) has upstream set to 'printdate' (t1) and 'templated'(t3) too has upstream set to 'printdate'(t1). Then why is the graph showing two 'printdate's ?? Are they separate task instances of that task? If yes, then how do I make sure that only 1 task instance of t1 runs (diamond pattern). There are also 4 green rectangular boxes(with two 'printdate's), instead of 3.

# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
    task_id='print_date',
    bash_command='date',
    dag=dag)

t2 = BashOperator(
    task_id='sleep',
    bash_command='sleep 5',
    retries=3,
    dag=dag)

templated_command = """
    {% for i in range(5) %}
        echo "{{ ds }}"
        echo "{{ macros.ds_add(ds, 7)}}"
        echo "{{ params.my_param }}"
    {% endfor %}
"""

t3 = BashOperator(
    task_id='templated',
    bash_command=templated_command,
    params={'my_param': 'Parameter I passed in'},
    dag=dag)

t2.set_upstream(t1)
t3.set_upstream(t1)

Second, why is the time above the dag runs (green circles), showing 8.40, 8.45 - ? What time/timezone is that? I have set start_date for the dag to 13.40 and my machine set to UTC.

enter image description here


Solution

  • They are not separate instances. You can see this:

    1. In Tree View, the start/end dates and duration of both circles will be exactly the same.

    2. In Gantt view, you will see the duration for only a single instance of print_date.

    In general, you can't map a DAG to a tree view without duplicating nodes like they've done.