Search code examples
pyspark

Different date format & reverse date


I have string type date data like the below;

date(String) new date(String)
6/23/2024 2024-06-23
1/8/2023 2023-08-01
2024-05-20 2024-05-20

how can I change from date(string) column to new date(string) format using pyspark? They both are string types


Solution

  • Basically, you can create a list of all the formats which could be expected in the date column and then choose the first non-null one. You can also determine the order in case of ambiguous dates

    e.g formats = ["d/M/yyyy", "M/d/yyyy", "yyyy-MM-dd"]

    "6/23/2024"  # assuming 6 is month and 23 is date since 23 can't be a month
    "1/8/2023"  # this means 1 is month and 8 is date
    To handle the output, we have put `"d/M/yyyy", "M/d/yyyy"` in order, so 
    the first format is selected first. But this will be ambiguous 
    if the M and d values matches for the first format and doesn't produce a null value.
        
    

    Script below :

    from pyspark.sql import SparkSession
    from pyspark.sql.functions import col, to_date, date_format, coalesce
    
    spark = SparkSession.builder.appName("example").getOrCreate()
    
    data = [("6/23/2024",),  # assuming 6 is month and 23 is date since 23 can't be a month
            ("1/8/2023",),  # this means 1 is month and 8 is date
            ("2024-05-20",)]
    columns = ["date"]
    df = spark.createDataFrame(data, columns)
    
    
    def standardize_date(df, input_col):
        formats = ["M/d/yyyy", "yyyy-MM-dd"]
        date_cols = [to_date(col(input_col), f).alias(f"parsed_{f}") for f in formats]
        df = df.select("*", *date_cols)
        standardized_col = coalesce(*[col(f"parsed_{f}") for f in formats])
        df = df.withColumn("new_date", date_format(standardized_col, "yyyy-MM-dd"))
        df = df.select(input_col, "new_date")
        return df
    
    
    result_df = standardize_date(df, "date")
    result_df.show()
    

    Output :

    +----------+----------+
    |      date|  new_date|
    +----------+----------+
    | 6/23/2024|2024-06-23|
    |  1/8/2023|2023-08-01|
    |2024-05-20|2024-05-20|
    +----------+----------+