Search code examples
sqlduplicateswindow-functionspresto

Handling duplicates when rolling totals using OVER Partition by


I'm trying to get the rolling amount column totals for each date, from the 1st day of the month to whatever the date column value is, shown in the input table.

Output Requirements

  1. Partition by the 'team' column
  2. Restart rolling totals on the 1st of each month

Question 1
Is my below query correct to get my desired output requirements shown in Output Table below? It seems to work but I must confirm.

SELECT
    *,
    SUM(amount) OVER (
        PARTITION BY
            team,
            month_id
        ORDER BY
            date ASC
    ) rolling_amount_total
FROM input_table;

Question 2
How can I handle duplicate dates, shown in the first 2 rows of Input Table? Whenever there is a duplicate date the amount is a duplicate as well. I see a solution here: https://stackoverflow.com/a/60115061/6388651 but no luck getting it to remove the duplicates. My non-working code example is below.

SELECT
    *,
    SUM(amount) OVER (
        PARTITION BY
            team,
            month_id
        ORDER BY
            date ASC
    ) rolling_amount_total
FROM (
    SELECT DISTINCT
        date,
        amount,
        team,
        month_id
    FROM input_table
) t

Input Table

date amount team month_id
2022-04-01 1 A 2022-04
2022-04-01 1 A 2022-04
2022-04-02 2 A 2022-04
2022-05-01 4 B 2022-05
2022-05-02 4 B 2022-05

Desired Output Table

date amount team month_id Rolling_Amount_Total
2022-04-01 1 A 2022-04 1
2022-04-02 2 A 2022-04 3
2022-05-01 4 B 2022-05 4
2022-05-02 4 B 2022-05 8

Solution

  • Q1. Your sum() over () is correct Q2. Replace from input_table, in your first query, with :

    from (select date, sum(amount) as amount, team, month_id 
          from input_table 
          group by date, team, month_id
          ) as t