Search code examples
functionapache-sparkpysparkaggregatewindow-functions

Window function inside aggregate function error


I get the following error message:

pyspark.sql.utils.AnalysisException: It is not allowed to use a window function inside an aggregate function. Please use the inner window function in a sub-query.

How can I "transform" the below code to have it work?

Master_Table_All_2 = Master_Table_All.withColumn("cumulative_paid_in_target_currency_period", F.sum(Master_Table_All.damage_amount_target_currency_in_period.over(Window.partitionBy("key").orderBy("date_end").rowsBetween(-sys.maxsize, 0)))) 

Solution

  • Try the following:

    Master_Table_All_2 = Master_Table_All.withColumn(
        "cumulative_paid_in_target_currency_period",
        F.sum("damage_amount_target_currency_in_period").over(Window
            .partitionBy("key")
            .orderBy("date_end")
            .rowsBetween(Window.unboundedPreceding, 0)
        )
    )