Search code examples
rapache-sparkapache-spark-sqlsparkr

Convert double to date using SparkR


When a date is a string SparkR function to_date converts that to date format correctly, but when it is a double, it doesn't.

Also, the R package zoo works.

library(zoo)

df <- tibble::tibble(date = 17000.0)

as.Date(df$date)

# [1] "2016-07-18"

However, SparkR doesn't

df_spark <- as.DataFrame(df)
df_spark <- df_spark %>% 
  SparkR::mutate(date2 = to_date(df_spark$date, "MM/dd/yyyy"))

# org.apache.spark.sql.AnalysisException: cannot resolve 'unix_timestamp(`date`, 'MM/dd/yyyy')' due to data type mismatch: argument 1 requires (string or date or timestamp) type, however, '`date`' is of double type.

I need the SparkR solution to convert double into a date.


Solution

  • If value should be interpreted as the days since the beginning of the epoch you should use date_add:

    df_spark %>% 
      SparkR::mutate(date2 = expr("date_add(to_date('1970-01-01'), date)")) %>% 
      head()
    
       date      date2                                                              
    1 17000 2016-07-18
    

    Basic arithmetics should work as well:

    df_spark %>% 
      SparkR::mutate(
        date2 = to_date(cast(column("date") * 60 * 60 * 24, "timestamp"))) %>% 
      head()
    
       date      date2
    1 17000 2016-07-18