Search code examples
sqlapache-sparkapache-spark-sqlwindow-functions

What's the default window frame for window functions


Running the following code:

val sales = Seq(
  (0, 0, 0, 5),
  (1, 0, 1, 3),
  (2, 0, 2, 1),
  (3, 1, 0, 2),
  (4, 2, 0, 8),
  (5, 2, 2, 8))
  .toDF("id", "orderID", "prodID", "orderQty")

val orderedByID = Window.orderBy('id')

val totalQty = sum('orderQty').over(orderedByID).as('running_total')
val salesTotalQty = sales.select(*, totalQty).orderBy('id')
salesTotalQty.show()

The result is:

+---+-------+------+--------+-------------+
| id|orderID|prodID|orderQty|running_total|
+---+-------+------+--------+-------------+
|  0|      0|     0|       5|            5|
|  1|      0|     1|       3|            8|
|  2|      0|     2|       1|            9|
|  3|      1|     0|       2|           11|
|  4|      2|     0|       8|           19|
|  5|      2|     2|       8|           27|
+---+-------+------+--------+-------------+

There is no window frame defined in the above code, it looks the default window frame is rowsBetween(Window.unboundedPreceding, Window.currentRow)

Not sure my understanding about default window frame is correct


Solution

  • From Spark Gotchas

    Default frame specification depends on other aspects of a given window defintion:

    • if the ORDER BY clause is specified and the function accepts the frame specification, then the frame specification is defined by RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,
    • otherwise the frame specification is defined by ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.