Here is my dataframe :
FlightDate=[20,40,51,50,60,15,17,37,36,50]
IssuingDate=[10,15,44,45,55,10,2,30,32,24]
Revenue = [100,50,40,70,60,40,30,100,200,100]
Customer = ['a','a','a','a','a','b','b','b','b','b']
df = spark.createDataFrame(pd.DataFrame([Customer,FlightDate,IssuingDate, Revenue]).T, schema=["Customer",'FlightDate', 'IssuingDate','Revenue'])
df.show()
+--------+----------+-----------+-------+
|Customer|FlightDate|IssuingDate|Revenue|
+--------+----------+-----------+-------+
| a| 20| 10| 100|
| a| 40| 15| 50|
| a| 51| 44| 40|
| a| 50| 45| 70|
| a| 60| 55| 60|
| b| 15| 10| 40|
| b| 27| 2| 30|
| b| 37| 30| 100|
| b| 36| 32| 200|
| b| 50| 24| 100|
+--------+----------+-----------+-------+
For convenience, I used number for days.
For each customer, I would like to sum revenues for all issuing dates between studied FlightDate and studied FlightDate + 10 days.
That is to say :
Here is the desired result :
+--------+----------+-----------+-------+------+
|Customer|FlightDate|IssuingDate|Revenue|Result|
+--------+----------+-----------+-------+------+
| a| 20| 10| 100| 0|
| a| 40| 15| 50| 110|
| a| 51| 44| 40| 60|
| a| 50| 45| 70| 60|
| a| 60| 55| 60| 0|
| b| 15| 10| 40| 100|
| b| 27| 2| 30| 300|
| b| 37| 30| 100| 0|
| b| 36| 32| 200| 0|
| b| 50| 24| 100| 0|
+--------+----------+-----------+-------+------+
I know it will involve some window functions but this one seems a bit tricky. Thanks
no need of a window function. It is just a join and an agg :
df.alias("df").join(
df.alias("df_2"),
on=F.expr(
"df.Customer = df_2.Customer "
"and df_2.issuingdate between df.flightdate and df.flightdate+10"
),
how='left'
).groupBy(
*('df.{}'.format(c)
for c
in df.columns)
).agg(
F.sum(F.coalesce(
"df_2.revenue",
F.lit(0))
).alias("result")
).show()
+--------+----------+-----------+-------+------+
|Customer|FlightDate|IssuingDate|Revenue|result|
+--------+----------+-----------+-------+------+
| a| 20| 10| 100| 0|
| a| 40| 15| 50| 110|
| a| 50| 45| 70| 60|
| a| 51| 44| 40| 60|
| a| 60| 55| 60| 0|
| b| 15| 10| 40| 100|
| b| 27| 2| 30| 300|
| b| 36| 32| 200| 0|
| b| 37| 30| 100| 0|
| b| 50| 24| 100| 0|
+--------+----------+-----------+-------+------+