Search code examples
sqlapache-sparkpysparkapache-spark-sqldate-format

How to format date in Spark SQL?


I need to transform this given date format: 2019-10-22 00:00:00 to this one: 2019-10-22T00:00:00.000Z

I know this could be done in some DB via:

In AWS Redshift, you can achieve this using the following:

TO_DATE('{RUN_DATE_YYYY/MM/DD}', 'YYYY/MM/DD') || 'T00:00:00.000Z' AS VERSION_TIME

But my platform is Spark SQL, so neither above two work for me, the best I could get is using this:

concat(d2.VERSION_TIME, 'T00:00:00.000Z') as VERSION_TIME

which is a bit hacky, but still not completely correct, with this, I got this date format: 2019-10-25 00:00:00T00:00:00.000Z, but this part 00:00:00 in the middle of the string is redundant and I cannot leave it there.

Anyone has any insight here would be greatly appreciated!


Solution

  • This is the natural way I think.

    spark.sql("""SELECT date_format(to_timestamp("2019-10-22 00:00:00", "yyyy-MM-dd HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'") as date""").show(false)
    

    The result is:

    +------------------------+
    |date                    |
    +------------------------+
    |2019-10-22T00:00:00.000Z|
    +------------------------+