I am new to Apache Airflow. I have some DAGs already running in the Airflow. Now I want to add SLA's to it so that I can track and monitor the tasks and get alert if something breaks.
I know how to add SLA's to DAGs default_args using timedelta() like below
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'sla': timedelta(minutes=30)
}
But I have below questions:
We can specify SLA for whole DAG or only for tasks individually?
What would be appropriate SLA time for the DAG that is running for 30 minutes?
What would be appropriate SLA time for a task that is running for 5 minutes?
Do we need to consider retry_delay as well while specifying SLA?
We can specify SLA for whole DAG or only for tasks individually?
I believe SLAs are provisioned only for individual tasks and not for DAG as a whole. But I think the same effect is achievable (can't say for sure though) for entire DAG by creating a task at the end (DummyOperator
) that is dependent on all other tasks of your DAG and setting an SLA on that closing task
What would be appropriate SLA time for the DAG that is running for 30 minutes?
This would entirely depend on factors like criticality of your task, its failure rate etc. But I would suggest that you begin with a 'strict-enough' timedelta (like 5 minutes) and then tune it (increase or decrease) from there
What would be appropriate SLA time for a task that is running for 5 minutes?
Same as above, start with 1 minute and tune from there
Do we need to consider retry_delay as well while specifying SLA?
Going by the docs, I'd say yes
:param sla: time by which the job is expected to succeed. Note that
this represents the ``timedelta`` after the period is closed. For
example if you set an SLA of 1 hour, the scheduler would send an email
soon after 1:00AM on the ``2016-01-02`` if the ``2016-01-01`` instance
has not succeeded yet.