Search code examples
airflowamazon-ecs

Airflow DAGs tasks not running when i run DAG despite tasks working fine when i test them


I have the following DAG defined in code:

from datetime import timedelta, datetime
import airflow
from airflow import DAG
from airflow.operators.docker_operator import DockerOperator
from airflow.contrib.operators.ecs_operator import ECSOperator

default_args = {
    'owner': 'airflow',
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    'start_date': datetime(2018, 9, 24, 10, 00, 00)
}

dag = DAG(
    'data-push',
    default_args=default_args,
    schedule_interval='0 0 * * 1,4',
)    


colors = ['blue', 'red', 'yellow']
for color in colors:
    ECSOperator(dag=dag,
            task_id='data-push-for-%s' % (color),
            task_definition= 'generic-push-colors',
            cluster= 'MY_ECS_CLUSTER_ARN',
            launch_type= 'FARGATE',
            overrides={
                'containerOverrides': [
                    {
                        'name': 'push-colors-container',
                        'command': [color]
                    }
                ]
            },
            region_name='us-east-1',
            network_configuration={
                    'awsvpcConfiguration': {
                           'securityGroups': ['MY_SG'],
                           'subnets': ['MY_SUBNET'],
                           'assignPublicIp': "ENABLED"
                       }
               }, 
    )

This should create a DAG with 3 tasks, one for each color in my colors list.

This seems good, when i run:

airflow list_dags

I see my dag listed:

data-push

And when I run:

airflow list_tasks data-push

I see my three tasks appear as they should:

data-push-for-blue
data-push-for-red
data-push-for-yellow

I then test run one of my tasks by entering the following into terminal:

airflow run data-push data-push-for-blue 2017-1-23

And this runs the task, which I can see appear in my ECS cluster on the aws dashboard so I know for a fact the task runs on my ECS cluster and the data is pushed succesfully and everything is great.

Now when I try and run the DAG data-push from the Airflow UI is where i run into a problem.

I run:

airflow initdb

followed by:

airflow webserver

and now go into the airflow UI at localhost:8080.

I see the dag data-push in the list of dags, click it, and then to test run the entire dag i click the "Trigger DAG" button. I don't add any configuration json and then click 'Trigger'. The tree view for the DAG then shows a green circle on the right of the tree structure, seemingly indicating the DAG is 'running'. But the green circle just stays there for ages and when I manually check my ECS dashboard I see no tasks actually running so nothing is happening after triggering the DAG from the Airflow UI, despite the tasks working when i manually run them from the CLI.

I am using the SequentialExecutor if that matters.

My two main theories as to why the triggering the DAG does nothing when running the individual tasks from the CLI works are that maybe I am missing something in my python code where I define the dag (maybe because I dont specifiy any dependencies for the tasks?) or that I am not running the airflow scheduler but if I am manually triggering the DAGS from the Airflow UI i don't see why the scheduler would need to be running and why it wouldn't show me an error saying this is a problem.

Any ideas?


Solution

  • Sounds like you did not unpause your dag: Toggle On/Off switch in the upper left of Web UI or using cli: airflow unpause <dag_id>.