Search code examples
pysparkapache-spark-sqlazure-databricksdatabricks-sql

How to convert T-SQL CASE WHEN STATEMENT to Databricks sparkSQL


I am trying to code a typical T-SQL CASE WHEN statement in sparkSQL.

The T-SQL CASE WHEN statement looks as follows:

SELECT
  *
 ,CASE
    WHEN DatesTestv3.as_of_date = EOMONTH(DATEADD(MONTH, -1, SYSDATETIME())) THEN 'LM'
  END AS 'Last Month End'
from dbo.DatesTestv3

My attempt at converting the above T-SQL to sparkSQL is as follows:

spark.sql("select *, case when as_of_date = last_day(add_months(current_date(),-1)) then 'LM' END AS 'Last Month End" from table)

However, I'm getting the ParseException error.

I'm sure it's probably a simple syntax issue.

Therefore, any thoughts on how to convert the T-SQL to sparkSQL?


Solution

  • As per my knowledge I have tried the Same CASE Statement in the Pyspark at my end.

    I have created the table with below columns. id , as_of_date , last_month_end

    from pyspark.sql.functions import to_date
    spark.sql("CREATE TABLE data020 (id INT, as_of_date DATE, last_month_end STRING)
    data = [
    (1, "2023-06-01", "MA"),
    (2, "2023-06-01", "MA"),
    (3, "2023-06-01", "MA"),
    (4, "2023-06-01", "MA"),
    (5, "2023-06-01", "MA"),
    (6, "2023-06-01", "MA"),
    (7, "2023-06-01", "MA"),
    (8, "2023-06-01", "MA"),
    (9, "2023-06-01", "MA"),
    (10, "2023-06-01", "MA")
    ]
    df = spark.createDataFrame(data, ["id", "as_of_date", "last_month_end"])
    df = df.withColumn("as_of_date", to_date(df.as_of_date))
    df.write.mode("overwrite").insertInto("data020")
    

    enter image description here The CASE statement I have tried is some thing like this. I have tried in 2 different ways.

    from pyspark.sql.functions import last_day, date_sub, current_date, when
    data = spark.table("data_dry")
    data = data_dry.withColumn("Last Month End", when(data.as_of_date == last_day(date_sub(current_date(), 1)), "LM").otherwise(None))
    data_dry.display()
    

    And

    result = spark.sql("""
    

    SELECT *, CASE WHEN as_of_date = last_day(date_add(current_date(), -1)) THEN 'LM' ELSE NULL END AS Last Month End FROM data_dry """)

    result.display()

    enter image description here Also you can try with the Date_trunc function too.