I have a dataset which has a date in string format that I want to convert to date in pyspark. how do I achieve this ? I tried few combinations which gives me null date though.
sample data
id | date | value |
---|---|---|
A | 04/Oct/2022 | 5 |
B | 10/Jan/2023 | 15 |
expected
id | date | value |
---|---|---|
A | 2022-10-04 | 5 |
B | 2023-01-10 | 15 |
Use to_date(<COL>,<date_format>)
pyspark inbuilt function for this case:
Example:
from pyspark.sql.functions import *
df = spark.createDataFrame([('A','04/Oct/2022',5),('B','10/Jan/2023',15)],['id','date','value'])
df.withColumn("date",to_date(col("date"),"dd/MMM/yyyy")).show(10,False)
#+---+----------+-----+
#|id |date |value|
#+---+----------+-----+
#|A |2022-10-04|5 |
#|B |2023-01-10|15 |
#+---+----------+-----+