I have to run spark job and in that spark job we have to pass date as an argument to read current directory. I am using Airflow to schedule job. Below are some info
start_date
import pendulum
local_tz = pendulum.timezone("Asia/Kolkata")
start_date': datetime(year=2020, month=8, day=3,tzinfo=local_tz)
schedule_interval
schedule_interval='20 0 * * *'
value to pass in job
{{ (execution_date + macros.timedelta(hours=5,minutes=30) - macros.timedelta(days=1)).strftime("%Y/%m/%d") }}
We have to run this job at midnight for the previous day but this expression giving me date for a day before yesterday. I added 5:30 because our airflow use UTC time.
Can anybody explain what is happening here with reference?
Thanks
Below is the definition for execution date
The execution time in Airflow is not the actual run time, but rather the start timestamp of its schedule period. For example, the execution time of the first DAG run is 2019–12–05 7:00:00, though it is executed on 2019–12–06.Dec 9, 2019
You don't need the macros.timedelta(days=1)).strftime("%Y/%m/%d") in your value