Search code examples
rapache-sparksparkr

How to Convert Date in "01MAR1978:00:00:00" string format to Date Format in SparkR?


I have dates in the following formats:

  1. 08MAR1978:00:00:00
  2. 10FEB1973:00:00:00
  3. 15AUG1982:00:00:00

I would like to convert them to:

  1. 1978-03-08
  2. 1973-02-10
  3. 1982-09-15

I have tried the following in SparkR:

period_uts <- unix_timestamp(all.new$DATE_OF_BIRTH, '%d%b%Y:%H:%M:%S')
period_ts <- cast(period_uts, 'timestamp')
period_dt <- cast(period_ts, 'date')
df <- withColumn(all.new, 'p_dt', period_dt)    

But when I do this, all the dates get changed into "NA".

Can anyone please provide some insights on how I can convert dates in %d%B%Y:%H:%M:%S format to dates in SparkR?

Thanks!


Solution

  • I figured out how to do it:

    all.new = all.new %>% withColumn("Date_of_Birth_Fixed", to_date(.$DATE_OF_BIRTH,  "ddMMMyyyy"))
    

    This works in Spark 2.2.x