Search code examples
javaapache-sparkhivetimestamp

Spark timestamp format with timezone issue


I have code:

timestampFormat="yyy-MM-dd'T'HH:mm:ssXXX"

or

timestampFormat="yyy-MM-dd'T'HH:mm:ssZZZZZ"
Dataset<Row> inputDataFrame = spark.read()
            .format("CSV")
            .option("timestampFormat", timestampFormat)
            .load(path/file);

The value 2022-04-05T08:19:00+00:00 is loaded into the hive table as 05.04.2022 10:19:00. There is 2 hours difference. It should be 05.04.2022 08:19:00. Can someone tell me what kind of format should I use?


Solution

  • You can set spark sql session timezone like below & rerun the job.

    --conf "spark.sql.session.timeZone=UTC" // Change it your timezone.
    

    or

    spark.conf.set("spark.sql.session.timeZone", "UTC") // Change it your timezone.