scala apache-spark datetime unix-timestamp

date_format doesn't handle timestamp with `00:00:00`

It formats 2020-01-27 00:00:00 of type timestamp as 2020-01-27 12:00:00 instead of 2020-01-27 00:00:00

  import spark.sqlContext.implicits._
  import java.sql.Timestamp
import org.apache.spark.sql.functions.typedLit


scala>   val stamp = typedLit(new Timestamp(1580105949000L))
stamp: org.apache.spark.sql.Column = TIMESTAMP('2020-01-27 00:19:09.0')


scala>   var df_test = Seq(5).toDF("seq").select(
     |     stamp.as("unixtime"),
     |     date_trunc("HOUR", stamp).as("date_trunc"),
     |     date_format(date_trunc("HOUR", stamp), "yyyy-MM-dd hh:mm:ss").as("hour")
     |   )
df_test: org.apache.spark.sql.DataFrame = [unixtime: timestamp, date_trunc: timestamp ... 1 more field]


scala> df_test.show
+-------------------+-------------------+-------------------+
|           unixtime|         date_trunc|               hour|
+-------------------+-------------------+-------------------+
|2020-01-27 00:19:09|2020-01-27 00:00:00|2020-01-27 12:00:00|
+-------------------+-------------------+-------------------+

Solution

Your pattern should be yyyy-MM-dd HH:mm:ss.

date_format, according to its documentation, uses specifiers supported by java.text.SimpleDateFormat:

Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
See SimpleDateFormat for valid date and time format patterns.

SimpleDateFormat's documentation can be found here

hh is used for "Hour in am/pm (1-12)". You're looking for the hour in day specifier, which is HH.