Search code examples
scalaapache-sparkepoch

Spark/Scala - unix_timestamp returning the wrong date?


I have a piece of Spark code that looks like this:

df //existing dataframe
  .withColumn("input_date", lit("20190105"))
  .withColumn("input_date_epoch", unix_timestamp(col("input_date"), "YYYYMMdd"))

Now, when I run a df.describe the data returned shows the input_date_epoch column having all values as 1546128000, which when I run through an epoch converter comes out as 2018-12-30 00:00:00, rather than the expected value of 2019-01-05 00:00:00

Am I doing something wrong here?


Solution

  • The pattern is wrong, if you want a year with four digits, use yyyy:

    spark.range(5)
      .withColumn("input_date", lit("20190105"))
      .withColumn("input_date_epoch", unix_timestamp(col("input_date"), "yyyyMMdd"))
    

    YYYYY actually refers to weekyear, see the documentation