Pyspark Format Dates

I have imported a CSV file by parsing it through pandas, then I converted the csv file to parquet format.

The data read Time as bigint and date as String

I can change the date using the to_date function i.e. df.withColumn('ObservationDate', to_date('ObservationDate')).

However, I am struggling to change the Time column to HH format, and I also struggling to add the hours to the Date column. I have looked at the spark documents, but I am not able to find what I am looking for.

Solution

Another alternative-

lpad to convert in hour to **HH** format & **unix_timestamp addition by setting UTC as spark session timezone

 df.show(false)
    df.printSchema()

    /**
      * +---------------+-------------------+
      * |ObservationTime|ObservationDate    |
      * +---------------+-------------------+
      * |1              |2016-02-01T00:00:00|
      * |12             |2016-02-01T00:00:00|
      * +---------------+-------------------+
      *
      * root
      * |-- ObservationTime: integer (nullable = false)
      * |-- ObservationDate: string (nullable = true)
      */

    spark.conf.set("spark.sql.session.timeZone", "UTC")
    df.withColumn("ObservationTime", lpad($"ObservationTime", 2, "0"))
      .withColumn("new_ObservationDate",
        (unix_timestamp($"ObservationDate".cast("timestamp")).cast("long") + unix_timestamp($"ObservationTime", "HH").cast("long"))
          .cast("timestamp")
      )
      .show(false)

    /**
      * +---------------+-------------------+-------------------+
      * |ObservationTime|ObservationDate    |new_ObservationDate|
      * +---------------+-------------------+-------------------+
      * |01             |2016-02-01T00:00:00|2016-02-01 01:00:00|
      * |12             |2016-02-01T00:00:00|2016-02-01 12:00:00|
      * +---------------+-------------------+-------------------+
      */