Search code examples
pythonpysparkdatabricksdateadd

How to get the date when using date_add in pyspark


I am trying to get the result from date_add function in pyspark, when I use the function it always returns as column type. To see the actual result we have to add the result to a column to a dataframe but I want the result to be stored in variable. How can I store the resulted date?

df = spark.createDataFrame([('2015-04-08',)], ['dt'])
r  = date_add(df.dt, 1)
print(r)
output:- Column<'date_add(dt, 1)'>

But I want output like below

output:- date.time(2015,04,09)
or 
'2015-04-09'

Solution

  • date_add has to be used within a withColumn. In case you want the desired output, consider a non-spark approach using datetime and timedelta.

    Alternately, if your use case requires spark, use the collect method like so

    r=df.withColumn(‘new_col’, date_add(col(‘dt’), 1)).select(‘new_col’).collect()