Pyspark: not cumulative sum over partition

I would like to sum over a partition without having as a result, a cumulative sum but rather, the whole sum on each partition:

From:

Category A	Category B	Value
1	2	100
1	2	150
2	1	110
2	2	200

I would like to have:

Category A	Category B	Value	Sum
1	2	100	250
1	2	150	250
2	1	110	110
2	2	200	200

With:

from pyspark.sql.functions import sum
from pyspark.sql.window import Window
windowSpec = Window.partitionBy(["Category A","Category B"])
df = df.withColumn('sum', sum(df.Value).over(windowSpec))

I don't obtain the result I want,I obtain the cumulative sum:

Category A	Category B	Value	Sum
1	2	100	100
1	2	150	250
2	1	110	110
2	2	200	200

How could I proceed? Thank you

Solution

When defining a window you can specify the range for the window.

You can specify the range (Window.unboundedPreceding, Window.unboundedFollowing) to sum over all rows within each partition irrespective of the ordering of the rows:

windowSpec = Window.partitionBy(["Category A","Category B"])\
    .rangeBetween(Window.unboundedPreceding, Window.unboundedFollowing)
df.withColumn('sum', F.sum(df.Value).over(windowSpec))\
    .orderBy("Category A", "Category B").show()

prints

+----------+----------+-----+-----+
|Category A|Category B|Value|  sum|
+----------+----------+-----+-----+
|         1|         2|  100|250.0|
|         1|         2|  150|250.0|
|         2|         1|  110|110.0|
|         2|         2|  200|200.0|
+----------+----------+-----+-----+