Search code examples
pysparkdate-format

how to convert the date format 'YYYY-MM-DD' to ddMMyy in pyspark?


I tried to convert the date format 2018-07-12 to ddMMyy using to_date but i get null after converting the dateformat

df = spark.createDataFrame([('2018-07-12',)], ['Date_col'])

df = df.withColumn('new_date',to_date('Date_col', 'ddMMyy'))

I need to use this logic to convert the dataframe column. I am new to the spark programming and tried lot of solutions but nothing helps.

I need to concat the ddMMyy from one column and hhss from other column

Any help please?


Solution

  • First of all let's create DataFrame

    df = spark.createDataFrame([('2018-07-12',)], ['Date_col'])
    df.show()
    
    +----------+
    |  Date_col|
    +----------+
    |2018-07-12|
    +----------+
    

    Then we will define UDF function for that.

    from datetime import datetime
    import pyspark.sql.types as T
    import pyspark.sql.functions as F
    
    
    def user_defined_timestamp(date_col):
        _date = datetime.strptime(date_col, '%Y-%m-%d')
        return _date.strftime('%d%m%y')
    
    user_defined_timestamp_udf = F.udf(user_defined_timestamp, T.StringType())
    

    And at the end we will apply our functions on DateFrame in order to create column we want.

    df = df.withColumn('new_date', user_defined_timestamp_udf('Date_col'))
    df.show()
    
    +----------+--------+
    |  Date_col|new_date|
    +----------+--------+
    |2018-07-12|  120718|
    +----------+--------+