date pyspark aws-glue aws-glue-data-catalog

what is the best way to cast or handle the date datatype in pyspark

Can you please help me to cast the below datatype in pyspark in the better possible way? we cant handle this in the dataframe.

Input:

Aug 11, 2020 04:34:54.0 PM

to expected output:

2020-08-11 04:34:54:00 PM

Solution

Try with from_unixtime, unix_timestamp functions.

Example:

#sample data in dataframe
df.show(10,False)
#+--------------------------+
#|ts                        |
#+--------------------------+
#|Aug 11, 2020 04:34:54.0 PM|
#+--------------------------+

df.withColumn("dt",from_unixtime(unix_timestamp(col("ts"),"MMM d, yyyy hh:mm:ss.SSS a"),"yyyy-MM-dd hh:mm:ss.SSS a")).\
show(10,False)
#+--------------------------+--------------------------+
#|ts                        |dt                        |
#+--------------------------+--------------------------+
#|Aug 11, 2020 04:34:54.0 PM|2020-08-11 04:34:54.000 PM|
#+--------------------------+--------------------------+

If you want new column to be timestamp type then use to_timestamp function in spark.

df.withColumn("dt",to_timestamp(col("ts"),"MMM d, yyyy hh:mm:ss.SSS a")).\
show(10,False)
#+--------------------------+-------------------+
#|ts                        |dt                 |
#+--------------------------+-------------------+
#|Aug 11, 2020 04:34:54.0 PM|2020-08-11 16:34:54|
#+--------------------------+-------------------+

df.withColumn("dt",to_timestamp(col("ts"),"MMM d, yyyy hh:mm:ss.SSS a")).printSchema()
#root
# |-- ts: string (nullable = true)
# |-- dt: timestamp (nullable = true)