The following PySpark code will return the following date format on the field 'mydates'
as yyyy-MM-dd
df = sql("select * from mytable where mydates = last_day(add_months(current_date(),-1))")
However, I would like the code to return the 'mydates' field with the following format
yyyyMMdd
I tried the following
df = sql("select * from mytable where mydates = last_day(add_months(current_date(),'yyyyMMdd'-1))")
I didn't get an error with the above, however it didnt' return any results. Whereas the previous code did return results, but with date format on field 'mydates' as yyyy-MM-dd and I would like yyyyMMdd.
Any thoughts?
I have updated this question in line with the suggested answer, however I'm still getting yyyy-MM-dd.
%python
from pyspark.sql.functions import date_format
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql("select * from xxxxx where date_format(mydates, 'yyyyMMdd') = date_format(last_day(add_months(current_date(), -1)), 'yyyyMMdd')")
Very strange
Below is my date format in the table. I believe it is same format in your table also.
No matter what format you check the condition in where you need to reformat in select clause. Because your format in the table initially itself
in yyyy-MM-dd
So, you need to alter in select clause itself.
code:
spark.sql("select date_format(mydates,'yyyyMMdd') as mydates from mytable where mydates = last_day(add_months(current_date(),-1))").show()
spark.sql("select date_format(mydates,'yyyyMMdd') as mydates from mytable where date_format(mydates, 'yyyyMMdd') = date_format(last_day(add_months(current_date(), -1)), 'yyyyMMdd')").show()
One more way of doing is using in-built function in spark regexp_replace
, that is after getting it from sql query.
from pyspark.sql.functions import *
spark.sql("select mydates from mytable").withColumn("dates-reformat", regexp_replace(col('mydates'), '-', '')).show()
Output: