Search code examples
pythondataframepysparktimestampunix-timestamp

Convert Unix Timestamp into ms timestamp in PySpark


I have a column in my Dataframe timestamp, which contains UNIX 13-digit timestamps as follows:

|   timestamp   | 
| ------------- |
| 1584528257638 |
| 1586618807677 |
| 1585923477767 |
| 1583314882085 |

With pandas it is fairly easy to convert it like:

ms = pd.to_datetime(df[column], unit='ms')
df[column] = ms

However, in pySpark it is not that easy and I found some others, such as this post trying to achieve this goal. The concatenation of the last Milliseconds does not work for me, it always results in the Second timestamp (HH:mm:ss) instead of HH:mm:ss.SSS.

What I tried so far was:

df = df.withColumn("unix_timestamp", F.unix_timestamp(df.timestamp,'yyyy-MM-dd HH:mm:ss.SSS z') + F.substring(df.timestamp, -3,3).cast('float')/1000)

df = df.withColumn("ms_Timestamp", F.to_timestamp(df["unix_timestamp"]))

This unfortunately, didn't convert it into a millisecond timestamp and I have no idea what else to do.

I would appreciate any help to finally get a millisecond timestamp.

All the best and thanks in advance.


Solution

  • Default to_timestamp, from_unixtime, unix_timestamp functions will not result milliseconds.

    But to get workaround use from_unixtime and concat functions to get the timestamp with millisecs.

    #using substring function
    df.withColumn("unix_timestamp", concat_ws(".",from_unixtime(substring(col("timestamp"),0,10),"yyyy-MM-dd HH:mm:ss"),substring(col("timestamp"),-3,3))).show(10,False)
    
    #using divide function
    df.withColumn("unix_timestamp", concat_ws(".",from_unixtime((col("timestamp")/1000),"yyyy-MM-dd HH:mm:ss"),substring(col("timestamp"),-3,3))).show(10,False)
    #+-------------+-----------------------+
    #|timestamp    |unix_timestamp         |
    #+-------------+-----------------------+
    #|1584528257638|2020-03-18 05:44:17.638|
    #|1586618807677|2020-04-11 10:26:47.677|
    #|1585923477767|2020-04-03 09:17:57.767|
    #|1583314882085|2020-03-04 03:41:22.085|
    #+-------------+-----------------------+