in Azure databricks, pyspark there is already an existing dataframe
df1
Name | date |
---|---|
A | 20210720 |
B | 20231005 |
C | 20190215 |
I hope the column date can be format to like below
Name | date |
---|---|
A | 07/20/2021 |
B | 10/05/2023 |
C | 02/15/2019 |
How should I write the scripts??
Thank you
you can code something like the below, here am assuming the date column is of string type
from pyspark.sql.functions import date_format, to_date
data = [("20231030",), ("20231115",), ("20231225",)]
columns = ["date_string"]
df = spark.createDataFrame(data, columns)
df = df.withColumn("to_date_format", to_date(df["date_string"], "yyyyMMdd"))
df = df.withColumn("formatted_date", date_format(df["to_date_format"], "dd/MM/yyyy"))
df.show()
df.printSchema()
output:
>>> df.show()
+-----------+--------------+--------------+
|date_string|to_date_format|formatted_date|
+-----------+--------------+--------------+
| 20231030| 2023-10-30| 30/10/2023|
| 20231115| 2023-11-15| 15/11/2023|
| 20231225| 2023-12-25| 25/12/2023|
+-----------+--------------+--------------+
>>> df.printSchema()
root
|-- date_string: string (nullable = true)
|-- to_date_format: date (nullable = true)
|-- formatted_date: string (nullable = true)