Search code examples
scalaapache-sparkdatetimeunix-timestamp

date_format doesn't handle timestamp with `00:00:00`


It formats 2020-01-27 00:00:00 of type timestamp as 2020-01-27 12:00:00 instead of 2020-01-27 00:00:00

  import spark.sqlContext.implicits._
  import java.sql.Timestamp
import org.apache.spark.sql.functions.typedLit


scala>   val stamp = typedLit(new Timestamp(1580105949000L))
stamp: org.apache.spark.sql.Column = TIMESTAMP('2020-01-27 00:19:09.0')


scala>   var df_test = Seq(5).toDF("seq").select(
     |     stamp.as("unixtime"),
     |     date_trunc("HOUR", stamp).as("date_trunc"),
     |     date_format(date_trunc("HOUR", stamp), "yyyy-MM-dd hh:mm:ss").as("hour")
     |   )
df_test: org.apache.spark.sql.DataFrame = [unixtime: timestamp, date_trunc: timestamp ... 1 more field]


scala> df_test.show
+-------------------+-------------------+-------------------+
|           unixtime|         date_trunc|               hour|
+-------------------+-------------------+-------------------+
|2020-01-27 00:19:09|2020-01-27 00:00:00|2020-01-27 12:00:00|
+-------------------+-------------------+-------------------+

Solution

  • Your pattern should be yyyy-MM-dd HH:mm:ss.

    date_format, according to its documentation, uses specifiers supported by java.text.SimpleDateFormat:

    Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
    See SimpleDateFormat for valid date and time format patterns.

    SimpleDateFormat's documentation can be found here

    hh is used for "Hour in am/pm (1-12)". You're looking for the hour in day specifier, which is HH.