Search code examples
datepysparkparquet

Pyspark Format Dates


I have imported a CSV file by parsing it through pandas, then I converted the csv file to parquet format.

The data read Time as bigint and date as String

I can change the date using the to_date function i.e. df.withColumn('ObservationDate', to_date('ObservationDate')).

However, I am struggling to change the Time column to HH format, and I also struggling to add the hours to the Date column. I have looked at the spark documents, but I am not able to find what I am looking for.

enter image description here


Solution

  • Another alternative-

    lpad to convert in hour to **HH** format & **unix_timestamp addition by setting UTC as spark session timezone

     df.show(false)
        df.printSchema()
    
        /**
          * +---------------+-------------------+
          * |ObservationTime|ObservationDate    |
          * +---------------+-------------------+
          * |1              |2016-02-01T00:00:00|
          * |12             |2016-02-01T00:00:00|
          * +---------------+-------------------+
          *
          * root
          * |-- ObservationTime: integer (nullable = false)
          * |-- ObservationDate: string (nullable = true)
          */
    
        spark.conf.set("spark.sql.session.timeZone", "UTC")
        df.withColumn("ObservationTime", lpad($"ObservationTime", 2, "0"))
          .withColumn("new_ObservationDate",
            (unix_timestamp($"ObservationDate".cast("timestamp")).cast("long") + unix_timestamp($"ObservationTime", "HH").cast("long"))
              .cast("timestamp")
          )
          .show(false)
    
        /**
          * +---------------+-------------------+-------------------+
          * |ObservationTime|ObservationDate    |new_ObservationDate|
          * +---------------+-------------------+-------------------+
          * |01             |2016-02-01T00:00:00|2016-02-01 01:00:00|
          * |12             |2016-02-01T00:00:00|2016-02-01 12:00:00|
          * +---------------+-------------------+-------------------+
          */