Search code examples
pythonpython-3.xpysparkapache-spark-sqlpyspark-pandas

How to cast Date column from string to datetime in pyspark/python?


I have a date column with string datatype when inferred in pyspark:

Mon Oct 17 15:57:48 EST 2022

How to cast string datatype as datetime?


Solution

  • you can use the required datetime formatters - 'E MMM dd HH:mm:ss z yyyy'. the resulting timestamp will be in UTC and, thus, you'll see that it will add 5 hours to the source ts.

    spark.conf.set('spark.sql.legacy.timeParserPolicy', 'LEGACY')
    
    spark.sparkContext.parallelize([('Mon Oct 17 15:57:48 EST 2022', )]).toDF(['dt_str']). \
        withColumn('dt', func.to_timestamp('dt_str', 'E MMM dd HH:mm:ss z yyyy')). \
        show(truncate=False)
    
    # +----------------------------+-------------------+
    # |dt_str                      |dt                 |
    # +----------------------------+-------------------+
    # |Mon Oct 17 15:57:48 EST 2022|2022-10-17 20:57:48|
    # +----------------------------+-------------------+