I have date in the String format 20240107193154
within databricks dataframe, I want to convert this into YYY-MM-DD:HH:MM:SS
datetime format within Pyspark .
I have tried using unixtimestmp but no luck.
We'll need 2 functions to achieve your goal:
Here's an example of how we might achieve this:
import pyspark.sql.functions as F
from pyspark.sql.types import *
# example data
data = [
['a', '20240104193154'],
['b', '20240105193154'],
['c', '20240106193154'],
['d', '20240107193154'],
['e', '20240108193154'],
['f', '20240109193154']]
schema = StructType([
StructField("Event", StringType(), True),
StructField("StringTimeStamp", StringType(), True)
])
# create dataframe
df = spark.createDataFrame(data, schema)
# F.to_timestamp converts the string format to a timestamp
# F.date_format converts the timestamp to desired string format
(
df
.withColumn('Timestamp',F.to_timestamp(F.col('StringTimeStamp'),'yyyyMMddHHmmss'))
.withColumn('FormattedString',F.date_format(F.col('Timestamp'),'yyyy-MM-dd:hh:mm:ss'))
).show()
We can see the results:
+-----+---------------+-------------------+-------------------+
|Event|StringTimeStamp| Timestamp| FormattedString|
+-----+---------------+-------------------+-------------------+
| a| 20240104193154|2024-01-04 19:31:54|2024-01-04:07:31:54|
| b| 20240105193154|2024-01-05 19:31:54|2024-01-05:07:31:54|
| c| 20240106193154|2024-01-06 19:31:54|2024-01-06:07:31:54|
| d| 20240107193154|2024-01-07 19:31:54|2024-01-07:07:31:54|
| e| 20240108193154|2024-01-08 19:31:54|2024-01-08:07:31:54|
| f| 20240109193154|2024-01-09 19:31:54|2024-01-09:07:31:54|
+-----+---------------+-------------------+-------------------+