Search code examples
pythonpysparkazure-databricks

How to return a date format with PySpark with Databricks


The following PySpark code will return the following date format on the field 'mydates'

as yyyy-MM-dd

df = sql("select * from mytable where mydates = last_day(add_months(current_date(),-1))")

However, I would like the code to return the 'mydates' field with the following format

yyyyMMdd

I tried the following

df = sql("select * from mytable where mydates = last_day(add_months(current_date(),'yyyyMMdd'-1))")

I didn't get an error with the above, however it didnt' return any results. Whereas the previous code did return results, but with date format on field 'mydates' as yyyy-MM-dd and I would like yyyyMMdd.

Any thoughts?

I have updated this question in line with the suggested answer, however I'm still getting yyyy-MM-dd.

%python
from pyspark.sql.functions import date_format
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql("select * from xxxxx where date_format(mydates, 'yyyyMMdd') = date_format(last_day(add_months(current_date(), -1)), 'yyyyMMdd')")

Very strange


Solution

  • Below is my date format in the table. I believe it is same format in your table also.

    enter image description here

    No matter what format you check the condition in where you need to reformat in select clause. Because your format in the table initially itself in yyyy-MM-dd

    enter image description here

    So, you need to alter in select clause itself.

    code:

    spark.sql("select  date_format(mydates,'yyyyMMdd') as mydates from mytable where mydates =  last_day(add_months(current_date(),-1))").show()
    
    spark.sql("select  date_format(mydates,'yyyyMMdd') as mydates from mytable where  date_format(mydates, 'yyyyMMdd') =  date_format(last_day(add_months(current_date(), -1)), 'yyyyMMdd')").show()
    

    One more way of doing is using in-built function in spark regexp_replace, that is after getting it from sql query.

    from pyspark.sql.functions import *
    spark.sql("select mydates from mytable").withColumn("dates-reformat", regexp_replace(col('mydates'), '-', '')).show()
    

    Output:

    enter image description here