I tried to convert the date format 2018-07-12 to ddMMyy using to_date but i get null after converting the dateformat
df = spark.createDataFrame([('2018-07-12',)], ['Date_col'])
df = df.withColumn('new_date',to_date('Date_col', 'ddMMyy'))
I need to use this logic to convert the dataframe column. I am new to the spark programming and tried lot of solutions but nothing helps.
I need to concat the ddMMyy from one column and hhss from other column
Any help please?
First of all let's create DataFrame
df = spark.createDataFrame([('2018-07-12',)], ['Date_col'])
df.show()
+----------+
| Date_col|
+----------+
|2018-07-12|
+----------+
Then we will define UDF function for that.
from datetime import datetime
import pyspark.sql.types as T
import pyspark.sql.functions as F
def user_defined_timestamp(date_col):
_date = datetime.strptime(date_col, '%Y-%m-%d')
return _date.strftime('%d%m%y')
user_defined_timestamp_udf = F.udf(user_defined_timestamp, T.StringType())
And at the end we will apply our functions on DateFrame in order to create column we want.
df = df.withColumn('new_date', user_defined_timestamp_udf('Date_col'))
df.show()
+----------+--------+
| Date_col|new_date|
+----------+--------+
|2018-07-12| 120718|
+----------+--------+