Search code examples
pysparkunix-timestamp

24h clock with from_unixtime


I need to transform a dataframe with a column of timestamps in Unixtime/LongType-Format to actual TimestampType.

According to epochconverter.com:

  • 1646732321 = 8. März 2022 10:38:41 GMT+1
  • 1646768324 = 8. March 2022 20:38:44 GMT+1

However, when I use from_unixtime on the dataframe, I get a 12-hour clock and it basically subtracts 12 hours from my second timestamp for some reason? How can I tell PySpark to use a 24h clock?

The output of the code below is:

+---+----------+-------------------+
|id |mytime    |mytime_new         |
+---+----------+-------------------+
|ABC|1646732321|2022-03-08 10:38:41|
|DFG|1646768324|2022-03-08 08:38:44|
+---+----------+-------------------+

The second line should be 2022-03-08 20:38:44.

Reproducible code example:

data = [
    ("ABC", 1646732321)
    ,
    ("DFG", 1646768324)
]
schema = StructType(
            [
                StructField("id", StringType(), True),
                StructField("mytime", LongType(), True),
            ]
        )
df = spark.createDataFrame(data, schema)
df = df.withColumn(
                "mytime_new",
                from_unixtime(df["mytime"], "yyyy-MM-dd hh:mm:ss"),
            )
df.show(10, False)

Solution

  • Found my mistake 3 minutes later... the issue was my timestamp-format string for the hour (hh):

    Instead of:

    from_unixtime(df["mytime"], "yyyy-MM-dd hh:mm:ss"),

    I needed:

    from_unixtime(df["mytime"], "yyyy-MM-dd HH:mm:ss"),