I am trying to code a typical T-SQL CASE WHEN statement in sparkSQL.
The T-SQL CASE WHEN statement looks as follows:
SELECT
*
,CASE
WHEN DatesTestv3.as_of_date = EOMONTH(DATEADD(MONTH, -1, SYSDATETIME())) THEN 'LM'
END AS 'Last Month End'
from dbo.DatesTestv3
My attempt at converting the above T-SQL to sparkSQL is as follows:
spark.sql("select *, case when as_of_date = last_day(add_months(current_date(),-1)) then 'LM' END AS 'Last Month End" from table)
However, I'm getting the ParseException error.
I'm sure it's probably a simple syntax issue.
Therefore, any thoughts on how to convert the T-SQL to sparkSQL?
As per my knowledge I have tried the Same CASE Statement in the Pyspark at my end.
I have created the table with below columns. id , as_of_date , last_month_end
from pyspark.sql.functions import to_date
spark.sql("CREATE TABLE data020 (id INT, as_of_date DATE, last_month_end STRING)
data = [
(1, "2023-06-01", "MA"),
(2, "2023-06-01", "MA"),
(3, "2023-06-01", "MA"),
(4, "2023-06-01", "MA"),
(5, "2023-06-01", "MA"),
(6, "2023-06-01", "MA"),
(7, "2023-06-01", "MA"),
(8, "2023-06-01", "MA"),
(9, "2023-06-01", "MA"),
(10, "2023-06-01", "MA")
]
df = spark.createDataFrame(data, ["id", "as_of_date", "last_month_end"])
df = df.withColumn("as_of_date", to_date(df.as_of_date))
df.write.mode("overwrite").insertInto("data020")
The CASE statement I have tried is some thing like this. I have tried in 2 different ways.
from pyspark.sql.functions import last_day, date_sub, current_date, when
data = spark.table("data_dry")
data = data_dry.withColumn("Last Month End", when(data.as_of_date == last_day(date_sub(current_date(), 1)), "LM").otherwise(None))
data_dry.display()
And
result = spark.sql("""
SELECT *,
CASE
WHEN as_of_date = last_day(date_add(current_date(), -1)) THEN 'LM'
ELSE NULL
END AS Last Month End
FROM data_dry
""")
result.display()