Search code examples
scalaapache-sparkutctimestamp-with-timezone

change the timestamp to UTC format in spark using scala


The question is kind of similar with the problem: Change the timestamp to UTC format in Pyspark

Basically, it is convert timestamp string format ISO8601 with offset to UTC timestamp string(2017-08-01T14:30:00+05:30 -> 2017-08-01T09:00:00+00:00 ) using scala.

I am kind of new to scala/java, I checked spark library which they dont have a way to convert without knowing the timezone, which I dont have a idea of timezone unless (I parse it in ugly way or using java/scala lib?) Can someone help?

UPDATE: The better way to do this: setup timezone session in spark, and use df.cast(DataTypes.TimestampType) to do the timezone shift


Solution

  • You can use the java.time primitives to parse and convert your timestamp.

    scala> import java.time.{OffsetDateTime, ZoneOffset}
    import java.time.{OffsetDateTime, ZoneOffset}
    
    scala> val datetime = "2017-08-01T14:30:00+05:30"
    datetime: String = 2017-08-01T14:30:00+05:30
    
    scala> OffsetDateTime.parse(datetime).withOffsetSameInstant(ZoneOffset.UTC)
    res44: java.time.OffsetDateTime = 2017-08-01T09:00Z