Search code examples
javadataframeapache-sparkcurrent-time

getting the current timestamp of each row of dataframe using Spark / Java


I want to get the current timestamp of each row.

I use the following code

dataframe.withColumn("current_date",current_timestamp());

But current_timestamp() is evaluated prior to serialisation so I will always get same date.

How can I evaluate current_timestamp() for each row of dataframe.

I need your help.

Thank you.


Solution

  • Try this -

    
        df2.withColumn("current_date", expr("reflect('java.lang.System', 'currentTimeMillis')"))
          .show(false)
    
        /**
          * +-----+------+-------------+
          * |class|gender|current_date |
          * +-----+------+-------------+
          * |1    |m     |1594137247247|
          * |1    |m     |1594137247247|
          * |1    |f     |1594137247247|
          * |2    |f     |1594137247272|
          * |2    |f     |1594137247272|
          * |3    |m     |1594137247272|
          * |3    |m     |1594137247272|
          * +-----+------+-------------+
          */
    
        df2.withColumn("current_date", expr("reflect('java.time.LocalDateTime', 'now')"))
          .show(false)
    
        /**
          * +-----+------+-----------------------+
          * |class|gender|current_date           |
          * +-----+------+-----------------------+
          * |1    |m     |2020-07-07T21:24:07.377|
          * |1    |m     |2020-07-07T21:24:07.378|
          * |1    |f     |2020-07-07T21:24:07.378|
          * |2    |f     |2020-07-07T21:24:07.398|
          * |2    |f     |2020-07-07T21:24:07.398|
          * |3    |m     |2020-07-07T21:24:07.398|
          * |3    |m     |2020-07-07T21:24:07.398|
          * +-----+------+-----------------------+
          */
    // you can convert current_date to timestamp by casting it to "timestamp"