Search code examples
apache-sparkdatetimepysparkunix-timestamp

Pyspark unix_timestamp striping the last zeros while converting from datetime to unix time


I have the following date dataframe,

end_dt_time
2020-10-12 04:00:00
2020-10-11 04:00:00
2020-10-10 04:00:00
2020-10-09 04:00:00
2020-10-08 04:00:00

while converting these dates to unix timestamp the trailing zero are not appearing giving me incorrect time in unix.

This is what i am applying :

df = df.withColumn('unix', F.unix_timestamp('en_dt_time'))

the output is missing the last 3 zeros(000)

en_dt_time          unix
2020-10-12 04:00:00 1602475200
2020-10-11 04:00:00 1602388800
2020-10-10 04:00:00 1602302400
2020-10-09 04:00:00 1602216000
2020-10-08 04:00:00 1602129600
2020-10-07 04:00:00 1602043200

the desired output is

en_dt_time          unix
2020-10-12 04:00:00 1602475200000
2020-10-11 04:00:00 1602388800000
2020-10-10 04:00:00 1602302400000
2020-10-09 04:00:00 1602216000000
2020-10-08 04:00:00 1602129600000
2020-10-07 04:00:00 1602043200000

How can i get this precision while converting to unix timestamp? I was able to generate this by multiplying the output with 1000

df = df.withColumn('unix', F.unix_timestamp('en_dt_time')*1000)

is this the right approach?


Solution

  • That's correct behavior. From the function's description:

    Convert time string with given pattern (‘yyyy-MM-dd HH:mm:ss’, by default) to Unix time stamp (in seconds), using the default timezone and the default locale

    So if you just want to get milliseconds, then you just need to convert seconds to milliseconds as you're doing right now.