Search code examples
airflowscheduleairflow-scheduler

airflow dag problem in run - run schedule in loop


i create this dag for run command on remote ssh in schedule.

from datetime import timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago
from airflow.contrib.hooks.ssh_hook import SSHHook as sscon
from airflow.contrib.operators.ssh_operator import SSHOperator


default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': days_ago(0),
    'email': ['e@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    #'retries': 1,
    #'retry_delay': timedelta(minutes=10),
}
dag = DAG(
    'ssh_second',
    default_args=default_args,
    description='A simple bash DAG LAB',
    schedule_interval=timedelta(minutes=1),
    tags=['test'],
)

sshcon = sscon(remote_host="192.168.1.250", username="user", password="password", port=22)

t1 = BashOperator(
    task_id='echo1',
    bash_command='echo "simple task! by dag" ',
    dag=dag,
)

t5 = SSHOperator(
    task_id="remote-connection",
    command="/bin/date >> /home/user/date.txt && echo 'from airflow' >> /home/user/date.txt",
    ssh_hook=sshcon,
    dag=dag)

t1 >> t5

and its worked but airflow run it in every second and unstoppable.

tree view of loop run

i do not know where the problem comes from!


Solution

  • Your schedule_interval is set for every minute: schedule_interval=timedelta(minutes=1) which is what's causing the DAG to execute so frequently. Update the schedule_interval to an interval that is less frequent.

    Also, I would highly recommend modifying the start_date to a static value rather than a dynamic one (e.g. days_ago(0)). This guide, and others, were really useful when I started with Airflow. Once you set a static start_date you can set catchup=False if you do not want the DAG to execute "missing" runs from the start_date to present time.