Search code examples
pythondataframedatepysparkdatabricks

Convert string dd/mmm/YYYY to yyyy-mm-dd in pyspark


I have a dataset which has a date in string format that I want to convert to date in pyspark. how do I achieve this ? I tried few combinations which gives me null date though.

sample data

id date value
A 04/Oct/2022 5
B 10/Jan/2023 15

expected

id date value
A 2022-10-04 5
B 2023-01-10 15

Solution

  • Use to_date(<COL>,<date_format>) pyspark inbuilt function for this case:

    Example:

    from pyspark.sql.functions import *
    df = spark.createDataFrame([('A','04/Oct/2022',5),('B','10/Jan/2023',15)],['id','date','value'])
    df.withColumn("date",to_date(col("date"),"dd/MMM/yyyy")).show(10,False)
    #+---+----------+-----+
    #|id |date      |value|
    #+---+----------+-----+
    #|A  |2022-10-04|5    |
    #|B  |2023-01-10|15   |
    #+---+----------+-----+