Search code examples
scaladateapache-sparkudf

Spark scala udf error for if else


I am trying to define udf with the function getTIme for spark scala udf but i am getting the error as error: illegal start of declaration. What might be error in the syntax and retutrn the date and also if there is parse exception instead of returing the null, send the some string as error

def getTime=udf((x:String) : java.sql.Timestamp => {
 if (x.toString() == "")  return null  
else { val format = new SimpleDateFormat("yyyy-MM-dd' 'HH:mm:ss"); 
val d = format.parse(x.toString());
val t = new Timestamp(d.getTime()); return t  
}})

Thank you!


Solution

  • The return type for the udf is derived and should not be specified. Change the first line of code to:

    def getTime=udf((x:String) => {
    // your code
    }
    

    This should get rid of the error.

    The following is a fully working code written in functional style and making use of Scala constructs:

    val data: Seq[String] = Seq("", null, "2017-01-15 10:18:30")
    val ds = spark.createDataset(data).as[String]
    
    import java.text.SimpleDateFormat
    import java.sql.Timestamp
    
    val fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
    // ********HERE  is the udf completely re-written: **********
    val f = udf((input: String) => {
      Option(input).filter(_.nonEmpty).map(str => new Timestamp(fmt.parse(str).getTime)).orNull
    })
    
    val ds2 = ds.withColumn("parsedTimestamp", f($"value"))
    

    The following is output:

    +-------------------+--------------------+
    |              value|     parsedTimestamp|
    +-------------------+--------------------+
    |                   |                null|
    |               null|                null|
    |2017-01-15 10:18:30|2017-01-15 10:18:...|
    +-------------------+--------------------+