Search code examples
rapache-sparksparkr

Use numeric variable input as days to make date_add work in SparkR


I have days and reference date that I want to use to get the correct date using SparkR. Here is a toy data and code:

library(magrittr)
library(SparkR)


df <- tibble::tribble(
        ~days,  ~date,
        17000L, "1970-01-01",
        17200L, "1970-01-01")
df_spark <- SparkR::as.DataFrame(df)

This works:

df_spark <- df_spark %>% 
  SparkR::mutate(date2 = date_add(to_date(df_spark$date), 17000))

But, this doesn't.

df_spark <- df_spark %>% 
  SparkR::mutate(date2 = date_add(to_date(df_spark$date), df_spark$days))   

It throws an error:

unable to find an inherited method for function ‘date_add’ for signature ‘"Column", "Column"’

I want to be able to provide column "days" as 2nd argument to date_add instead of number as there are many different values to "days". How should I do that? If it's not possible with date_add, what's the other solution in SparkR?


Solution

  • Instead of using date_add directly you should use expr:

    expressiondf_spark <- df_spark %>% 
      SparkR::mutate(date2 = expr("date_add(to_date(date), days)"))
    
    expressiondf_spark %>% head()
    
       days       date      date2                                                   
    1 17000 1970-01-01 2016-07-18
    2 17200 1970-01-01 2017-02-03