It formats 2020-01-27 00:00:00
of type timestamp
as 2020-01-27 12:00:00
instead of 2020-01-27 00:00:00
import spark.sqlContext.implicits._
import java.sql.Timestamp
import org.apache.spark.sql.functions.typedLit
scala> val stamp = typedLit(new Timestamp(1580105949000L))
stamp: org.apache.spark.sql.Column = TIMESTAMP('2020-01-27 00:19:09.0')
scala> var df_test = Seq(5).toDF("seq").select(
| stamp.as("unixtime"),
| date_trunc("HOUR", stamp).as("date_trunc"),
| date_format(date_trunc("HOUR", stamp), "yyyy-MM-dd hh:mm:ss").as("hour")
| )
df_test: org.apache.spark.sql.DataFrame = [unixtime: timestamp, date_trunc: timestamp ... 1 more field]
scala> df_test.show
+-------------------+-------------------+-------------------+
| unixtime| date_trunc| hour|
+-------------------+-------------------+-------------------+
|2020-01-27 00:19:09|2020-01-27 00:00:00|2020-01-27 12:00:00|
+-------------------+-------------------+-------------------+
Your pattern should be yyyy-MM-dd HH:mm:ss
.
date_format
, according to its documentation, uses specifiers supported by java.text.SimpleDateFormat
:
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
See SimpleDateFormat for valid date and time format patterns.
SimpleDateFormat
's documentation can be found here
hh
is used for "Hour in am/pm (1-12)". You're looking for the hour in day specifier, which is HH
.