Some of my DAG are waiting to get scheduled, and some are waiting in the queue. I suspect there are reasons for this delay but not sure how I can start to debug this problem. Majority of the pipelines are running Spark jobs.
Can someone help to give me some directions in terms of where to look at to 1) anaylse which DAGs were delayed (did not start at the scheduled time) 2) where are the places I should look at to find out if the resources are enough. I'm quite new to scheduling in Airflow. Many thanks. Please let me know if I can describe the question better.
If you are looking for code that takes advantage of Airflows' wider capabilities.
There are three modules within airflow.models
which can be harnessed.
print(i, run.state, run.execution_date, run.start_date)
. So you can see what is going on under the hood.id state dag_id queued_at execution_date start_date end_date run_id data_interval_start data_interval_end last_scheduling_decision
from datetime import datetime, timedelta
from airflow import DAG
from airflow.models import DagBag, DagModel, DagRun
from airflow.operators.python import PythonOperator
# make a function that returns if a DAG is set to active or paused
def check_dag_active():
bag = DagBag()
for dag_id in bag.dags:
in_bag = DagModel.get_current(dag_id)
if not in_bag.is_paused:
latest = DagRun.find(dag_id=dag_id)
latest.sort(key=lambda x: x.execution_date, reverse=True)
for i, run in enumerate(latest):
print(i, run.state, run.execution_date, run.start_date)
# if run.state == 'queued':
# return [run.dag_id, run.execution_date, run.start_date]
with DAG(
'stack_overflow_ans_3',
tags = ['SO'],
start_date = datetime(2022, 1, 1),
schedule_interval = None,
catchup = False,
is_paused_upon_creation = False
) as dag:
t1 = PythonOperator(
task_id = 'task_that_will_fail',
python_callable = check_dag_active
)