Search code examples
scaladateapache-sparkapache-spark-sqlto-date

Convert date to another format Scala Spark


I am reading a CSV that contains two types of date:

  • dd-MMM-yyyy hh:mm:ss -> 13-Dec-2019 17:10:00
  • dd/MM/yyyy hh:mm -> 11/02/2020 17:33

I am trying to transform all dates of the first type into the second type but I can't find a good solution. I am trying this:

val pr_date = readeve.withColumn("Date", when(to_date(col("Date"),"dd-MMM-yyyy hh:mm:ss").isNotNull,
      to_date(col("Date"),"dd/MM/yyyy hh:mm")))
pr_date.show(25)

And I get the entire Date column as null values:

null values

I am trying with this function:

 def to_date_(col: Column,
            formats: Seq[String] = Seq("dd-MMM-yyyy hh:mm:ss", "dd/MM/yyyy hh:mm")) = {
    coalesce(formats.map(f => to_date(col, f)): _*)
 }
 
 val p2 = readeve.withColumn("Date",to_date_(readeve.col(("Date")))).show(125)

And in the first type of date i get nulls too:

enter image description here

What am I doing wrong? (new with Scala Spark)

Scala version: 2.11.7 Spark version: 2.4.3


Solution

  • Try code below? Note that 17 is HH, not hh. Also try to_timestamp instead of to_date because you want to keep the time.

    val pr_date = readeve.withColumn(
        "Date",
        coalesce(
            date_format(to_timestamp(col("Date"),"dd-MMM-yyyy HH:mm:ss"),"dd/MM/yyyy HH:mm"),
            date_format(to_timestamp(col("Date"),"dd/MM/yyyy HH:mm"),"dd/MM/yyyy HH:mm")
        )
    )