Search code examples
dataframeapache-sparkpysparkspark-streamingapache-spark-sql

Create a column with date which is 3 years in the past from the given date column (pyspark)?


I want to create a column using pyspark that contains the date which is 3 years prior to the date in a given column. The date column looks like this :

             date        
        2018-08-01   
        2016-08-11
        2014-09-18
        2018-12-08
        2011-12-18

And I want this result :

         date         past date
        2018-08-01   2015-08-01
        2016-08-11   2013-08-11
        2014-09-18   2011-09-18
        2018-12-08   2015-12-08
        2011-12-18   2008-12-18

Solution

  • You can use date_sub function.

    Here is Scala code which will be very to python.

    df.withColumn("past_date",date_sub(col("date"), 1095))