I am expecting the ABC_df to display/return data in my databricks notebook. but its not returning the data though there is data for the period '2019-02-24' AND '2019-03-31' I tried using
WHERE DATE_FORMAT(E.EVENT_TIME_UTC,'yyyy-MM-dd') BETWEEN '2019-02-24' AND '2019-03-31'
where condition it works fine and successfully returns the data
But I want to implement a dynamic parameter,So I am using below options but data is not returning
BETWEEN CAST('{IterStartLagDatetime}' AS STRING) AND CAST('{IterEndDatetime}' AS STRING)
BETWEEN to_date('{IterStartLagDatetime}','yyyy-MM-dd') AND to_date('{IterEndDatetime}','yyyy-MM-dd')
BETWEEN DATE_FORMAT('{IterStartLagDatetime}','yyyy-MM-dd') AND DATE_FORMAT('{IterEndDatetime}','yyyy-MM-dd')
complete code is listed above. please help me with any other possible solutions
%python
BatchInsert_StartYear = 2019
BatchInsert_EndYear = 2019
while (BatchInsert_StartYear <= BatchInsert_EndYear):
print(BatchInsert_StartYear)
MonthCount = 1
while (MonthCount < 13):
if(MonthCount < 12):
IterEndDatetime = right('00'+ str(MonthCount+1),2)+'-01-'+ str(BatchInsert_StartYear)
IterEndDatetime = datetime.strptime(IterEndDatetime,'%m-%d-%Y')+ timedelta(days=-1)
IterEndDatetime = IterEndDatetime.strftime("%Y-%m-%d")
print(IterEndDatetime)
IterStartDatetime = right('00'+ str(MonthCount),2)+'-01-'+ str(BatchInsert_StartYear)
IterStartLagDatetime = datetime.strptime(IterStartDatetime,'%m-%d-%Y')+ timedelta(days=-5)
IterStartDatetime = datetime.strptime(IterStartDatetime,'%m-%d-%Y').strftime("%Y-%m-%d")
print(IterStartDatetime)
IterStartLagDatetime = IterStartLagDatetime.strftime("%Y-%m-%d")
print(IterStartLagDatetime)
else:
IterEndDatetime = right('00'+ str(1),2)+'-01-'+ str(BatchInsert_StartYear+1)
IterEndDatetime = datetime.strptime(IterEndDatetime,'%m-%d-%Y')+ timedelta(days=-1)
IterEndDatetime = IterEndDatetime.strftime("%Y-%m-%d")
print(IterEndDatetime)
IterStartDatetime = right('00'+ str(MonthCount),2)+'-01-'+ str(BatchInsert_StartYear)
IterStartLagDatetime = datetime.strptime(IterStartDatetime,'%m-%d-%Y')+ timedelta(days=-5)
IterStartDatetime = datetime.strptime(IterStartDatetime,'%m-%d-%Y').strftime("%Y-%m-%d")
print(IterStartDatetime)
IterStartLagDatetime = IterStartLagDatetime.strftime("%Y-%m-%d")
print(IterStartLagDatetime)
ABC_df = spark.sql('''
SELECT *
FROM Temp_HISTORICAL_SWIPE_DETAILS E
BETWEEN DATE_FORMAT('{IterStartLagDatetime}','yyyy-MM-dd') AND DATE_FORMAT('{IterEndDatetime}','yyyy-MM-dd')
''')
ABC_df.show()
From the given code, I can see that you are trying to reference variables directly inside spark.sql()
directly. Look at the following demonstration to understand how to use variables inside the query (string).
demo3
as an example.%sql
select * from demo3 WHERE DATE_FORMAT(dt,'yyyy-MM-dd') BETWEEN '2019-02-24' AND '2019-03-31'
Now let's say you want to use variables inside this query (you have created these date variables directly, but I have given the value directly for demonstration).
If I give the query same as yours, we will not get desired. You can use another variable query
to check what query is getting executed and then give that query
variable
startdate='2019-02-24'
enddate='2019-03-31'
query = '''select * from demo3 WHERE DATE_FORMAT(dt,'yyyy-MM-dd') BETWEEN '{startdate}' AND '{enddate}' '''
print(query)
df = spark.sql(query)
df.show()
startdate='2019-02-24'
enddate='2019-03-31'
#use f"...{variable}..." to reference variables.
query = f'''select * from demo3 WHERE DATE_FORMAT(dt,'yyyy-MM-dd') BETWEEN '{startdate}' AND '{enddate}' '''
print(query)
df = spark.sql(query)
df.show()