Search code examples
scalaapache-sparkapache-spark-sqlunix-timestamp

How to convert a column of string dates into a column unix epochs in spark


I am new to spark and scala and I would like to convert a column of string dates, into Unix epochs. My dataframe looks like this:

+----------+-------+
|   Dates  |Reports|
+----------+-------+
|2020-07-20|     34|
|2020-07-21|     86|
|2020-07-22|    129|
|2020-07-23|     98|
+--------+---------+
The output should be 
+----------+-------+
|   Dates  |Reports|
+----------+-------+
|1595203200|     34|
|1595289600|     86|
|1595376000|    129|
|1595462400|     98|
+--------+---------+
``

Solution

  • Use unix_timestamp.

    val df = Seq(("2020-07-20")).toDF("date")
    df.show
    df.withColumn("unix_time", unix_timestamp('date, "yyyy-MM-dd")).show
    
    +----------+
    |      date|
    +----------+
    |2020-07-20|
    +----------+
    
    +----------+----------+
    |      date| unix_time|
    +----------+----------+
    |2020-07-20|1595203200|
    +----------+----------+