Search code examples
pythonapache-sparkairflowschedulerdirected-acyclic-graphs

Airflow execution_date wrong value


I have to run spark job and in that spark job we have to pass date as an argument to read current directory. I am using Airflow to schedule job. Below are some info

start_date

import pendulum
local_tz = pendulum.timezone("Asia/Kolkata")
start_date': datetime(year=2020, month=8, day=3,tzinfo=local_tz)

schedule_interval

schedule_interval='20 0 * * *'

value to pass in job

{{ (execution_date + macros.timedelta(hours=5,minutes=30) - macros.timedelta(days=1)).strftime("%Y/%m/%d") }}

We have to run this job at midnight for the previous day but this expression giving me date for a day before yesterday. I added 5:30 because our airflow use UTC time.

Can anybody explain what is happening here with reference?

Thanks


Solution

  • Below is the definition for execution date

    The execution time in Airflow is not the actual run time, but rather the start timestamp of its schedule period. For example, the execution time of the first DAG run is 2019–12–05 7:00:00, though it is executed on 2019–12–06.Dec 9, 2019
    

    taken from https://towardsdatascience.com/apache-airflow-tips-and-best-practices-ff64ce92ef8#:~:text=The%20execution%20time%20in%20Airflow,on%202019%E2%80%9312%E2%80%9306.

    You don't need the macros.timedelta(days=1)).strftime("%Y/%m/%d") in your value