Search code examples
scaladateapache-sparkunix-timestamp

spark scala long converts to timestamp with milliseconds in parquet dataframe


Could someone please guide me that how to convert long to timestamp with milliseconds? I know how to do to the yyyy-MM-dd HH:mm:ss But I would like to the milliseconds yyyy-MM-dd HH:mm:ss.SSS

My parquet structure is like this

|-- header: struct (nullable = true)
 |    |-- time: long (nullable = true)
...

One sample for time is 1600676073054:

Scala

scala> spark.sql("select from_unixtime(word) as ts, word from tmp_1").show(false)
+--------------------+-------------+
|ts                  |word         |
+--------------------+-------------+
|52693-05-28 18:30:54|1600676073054|
+--------------------+-------------+


scala> spark.sql("select from_unixtime(word/1000) as ts, word from tmp_1").show(false)
+-------------------+-------------+
|ts                 |word         |
+-------------------+-------------+
|2020-09-21 16:14:33|1600676073054|
+-------------------+-------------+


scala> spark.sql("select from_unixtime(word) as ts, word from tmp_1").show(false)
+--------------------+-------------+
|ts                  |word         |
+--------------------+-------------+
|52693-05-28 18:30:54|1600676073054|
+--------------------+-------------+

Sql Server

declare @StartDate datetime2(3) = '1970-01-01 00:00:00.000'
, @milliseconds bigint = 1600676073054
, @MillisecondsPerDay int = 60 * 60 * 24 * 1000 -- = 86400000

SELECT  DATEADD(MILLISECOND, TRY_CAST(@milliseconds % @millisecondsPerDay AS INT), DATEADD(DAY, TRY_CAST(@milliseconds / @millisecondsPerDay AS INT), @StartDate));
--2020-09-21 08:14:33.054

I would like to know how to convert out the 054 as milliseconds.

Thanks.


Solution

  • Spark does not support epoch milliseconds, so you need to divide it by a 1000.

    val df = spark.createDataFrame(
        Seq(
            
           (1, "1600676073054")
        )
    ).toDF("id","long_timestamp")
    
    
     df.withColumn(
            "timestamp_mili",
            (col("long_timestamp")/1000).cast("timestamp") 
        ).show(false)
        
      //+---+--------------+-----------------------+
      //|id |long_timestamp|timestamp_mili         |
      //+---+--------------+-----------------------+
      //|1  |1600676073054 |2020-09-21 08:14:33.054|
      //+---+--------------+-----------------------+