Search code examples
datetimepysparkazure-databricksunix-timestampepoch

Date Time Conversions in PySpark


Can someone please explain me how the below epoch time

  • epoch time/unix-timestamp :1668443121840
  • converts to the date : 2022-11-14T16:25:21.840+0000

How is the conversion taking place and additionally how to identify an epoch timestamp if it is mentioned in seconds, milliseconds, microseconds or nanoseconds?

Additionally, is there a function in pyspark to convert the date back to epoch timestamp?

Thanks! in advance.

I tried a number of methods but I am not achieving the expected result:

t = datetime.datetime.strptime('2021-11-12 02:12:23', '%Y-%m-%d %H:%M:%S')
print(t.strftime('%s'))

As I am not able to control the format or accuracy in terms of seconds, milliseconds, microseconds or nanoseconds.


Solution

  • The epoch time/unix-timestamp uses a reference date: 00:00:00 UTC on 1 January 1970. It counts the seconds/milliseconds from that date.

    The value you are looking for is in miliseconds, so you would have to calculate the milliseconds and concatenate with the epoch time:

    import pyspark.sql.functions as F
    
    df = spark.createDataFrame([('2022-11-14T16:25:21.840+0000',)]).toDF("timestamp")\
    
    
    df\
    .withColumn("timestamp",F.to_timestamp(F.col("timestamp")))\
    .withColumn("epoch_seconds",F.unix_timestamp("timestamp"))\
    .withColumn("epoch_miliseconds",F.concat(F.unix_timestamp("timestamp"), F.date_format("timestamp", "S")))\
    .show(truncate=False)
    
    # +----------------------+-------------+-----------------+
    # |timestamp             |epoch_seconds|epoch_miliseconds|
    # +----------------------+-------------+-----------------+
    # |2022-11-14 16:25:21.84|1668443121   |16684431218      |
    # +----------------------+-------------+-----------------+