I'm using databricks to ingest a csv and have a column that needs casting from a string to a timestamp. The data comes in as a string in this format: 31-MAR-27 10.59.00.000000 PM GMT
The code I'm using is python, and the cluster is running spark 3.0.1. I've used the below code before but not with the AM/PM or timezone and I can't get a format that doesn't return null for every value
df.withColumn('columnName', to_timestamp(col('columnName'),'dd-MMM-yy HH.mm.ss.SSSSSS a zzz').cast(TimestampType()))
I'm using this documentation https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html to help
Since you have am/pm in your timestamp, you should use lower case h
for clock-hour-of-am-pm (1-12)
, but not H
which stands for hour-of-day (0-23)
.
df2 = df.withColumn(
'columnName',
to_timestamp(col('columnName'), 'dd-MMM-yy hh.mm.ss.SSSSSS a zzz')
)
Note that there is no need to cast to timestamp type because to_timestamp
already returns a timestamp type column.