Search code examples
scalaapache-sparkdateadd

Spark Scala - timestamp into date_add()


I'm having some trouble with something that must be simple.. but I'm still in the early stages of learning.

The problem is: I wanna get the start date of a project and then add 20 days to it. I wanna do it in an automated way, not manually adding it (like.. although I know it started on 2019-06-01 I don't want to manually input the date "2019-06-21" because I might change it later and I guess it would be just lazy of me to do it like this).

So, to get the start date I was doing:

val start_date = table.select(date_trunc("day", min('applied_at)).as("start_date"))

Which was returning a

start_date: org.apache.spark.sql.DataFrame = [start_date: timestamp]

However, if I try to use it on the function date_add() I get an error.

val objective = date_add(start_date, 20)
error: type mismatch;
 found   : org.apache.spark.sql.DataFrame
    (which expands to)  org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 required: org.apache.spark.sql.Column
                           && date_trunc("day",$"applied_at").as("applied_at") < date_add(start_date, projection_proxy)
                                                                                          ^ 

So I've tried other formats and playing with it a bit, but couldn't find an answer by myself. Must be because I don't have the necessary basic knowledge.

I get it's asking me for a Column, but can I just transform a variable into a column? Should I approach this problem in a different way?


Solution

  • As the error suggest date_add function accepts col datatype however you are trying to pass the whole Dataframe.

    When you apply a select on an existing dataframe you get an a new dataframe, regardless of how many column or row it has.

    To Add Days to Date column

    df = start_date.withColumn("new_date", date_add($"start_date", 20))
    

    if you just want to have single column, or only column with new date

    val newDateDF = minDateDF.
      withColumn("new_date", date_add($"start_date", 1))
      .drop($"start_date")
      .select($"new_date".alias("start_date"))
    

    if you want to have this as a Value instead

    val new_date = newDateDF.select($"start_date").collect()(0)(0).toString()