Search code examples
pythonpysparkdatabricksepoch

Converting Epoch Time to Timestamp in Pyspark


I have a df with a column having epoch time. The variable type of the epoch timecolumn is string. I want it to convert into Timestamp. I am using the following command

from pyspark.sql.functions import from_utc_timestamp
df = df.withColumn('start_time',from_unixtime(df.recognition_start_time,'UTC'))
df.select('recognition_start_time').show(10,False)

But the command is giving me null as a result. What is the mistake I am making?

Epoch Time : 1583471040000
Output expected :2020-03-06 05:04:00

Solution

  • In from_unixtime we need to specify the expected output format(yyyy-MM-dd hh:mm:ss) and by using from_utc_timestamp we can convert the output from utc to given timezone.

    Example:

    spark.sql("""select from_utc_timestamp(from_unixtime("1583471040000"/1000,"yyyy-MM-dd hh:mm:ss"),"America/Chicago")""").show(false)
    +---------------------+
    |_c0                  |
    +---------------------+
    |2020-03-05 05:04:00.0|
    +---------------------+