Search code examples
sqldatabasedb2aggregate-functions

SQL: How to calculate a cumulative sum that resets per time slot and name?


Let's suppose I have the following table.

CREATE TABLE transaction (
  "ID" INTEGER PRIMARY KEY,
  "NAME" VARCHAR(4),
  "TIMESTAMP" INTEGER,
  "QUANTITY" INTEGER
);

INSERT INTO transaction
  ("ID", "NAME", "TIMESTAMP", "QUANTITY")
VALUES
  ('1', 'dani', '1686311907', '1'),
  ('2', 'dani', '1686312071', '4'),
  ('3', 'dani', '1686748928', '2'),
  ('4', 'pet', '1687937005', '2'),
  ('5', 'pet', '1688109281', '6');

Which corresponds to:

| ID  | NAME | TIMESTAMP  | QUANTITY |
| --- | ---- | ---------- | -------- |
| 1   | dani | 1686311907 | 1        |
| 2   | dani | 1686312071 | 4        |
| 3   | dani | 1686748928 | 2        |
| 4   | pet  | 1687937005 | 2        |
| 5   | pet  | 1688109281 | 6        |

This table describes a series of transactions. In general, what I want is, for each name and for each timestamp, to calculate a cumulative sum of the quantities within each working slot. Each working slot will include those timestamps that are within a time interval of less than 10800 seconds (equivalent to 3 hours). In other words, the cumulative sum must be reset, for each name, on each working slot.

What I would expect to get would be a table like the following:

| ID  | NAME | TIMESTAMP  | QUANTITY | CUM_QUANTITY |
| --- | ---- | ---------- | -------- | ------------
| 1   | dani | 1686311907 | 1        | 1            |
| 2   | dani | 1686312071 | 4        | 5            |
| 3   | dani | 1686748928 | 2        | 2            | # new working slot
| 4   | pet  | 1687937005 | 2        | 2            | # new name
| 5   | pet  | 1688109281 | 6        | 6            | # new working slot

So far I've tried to add a transitional column to check if the record belongs to a new day:

select 
  *, 
  CASE WHEN coalesce(
    "TIMESTAMP" - lag("TIMESTAMP", 1) over (
      partition by "NAME" 
      order by 
        "TIMESTAMP" asc
    ), 
    0
  ) < 3600 * 3 THEN 0 ELSE 1 END AS NEW_DAY 
from 
  transaction 
ORDER BY 
  "NAME", 
  "TIMESTAMP"

Which give us the following table:

| ID  | NAME | TIMESTAMP  | QUANTITY | new_day |
| --- | ---- | ---------- | -------- | ------- |
| 1   | dani | 1686311907 | 1        | 0       |
| 2   | dani | 1686312071 | 4        | 0       |
| 3   | dani | 1686748928 | 2        | 1       |
| 4   | pet  | 1687937005 | 2        | 0       |
| 5   | pet  | 1688109281 | 6        | 1       |

And then perform some kind of window by using NAME and new_day and aggregating by SUM(QUANTITY), but this approach doesn't work because new_day only takes the values 0 and 1, and it should take from 0, 1, ... to the maximum working slots per user.

Does anybody know how to proceed?


Solution

  • Refer the OLAP specification topic on the RANGE OLAP function use.

    WITH transaction ("ID", "NAME", "TIMESTAMP", "QUANTITY") AS 
    (
    VALUES
      (1, 'dani', 1686311907, 1),
      (2, 'dani', 1686312071, 4),
      (3, 'dani', 1686748928, 2),
      (4, 'pet',  1687937005, 2),
      (5, 'pet',  1688109281, 6)
    )
    SELECT 
      * 
    , sum ("QUANTITY") over (partition by "NAME" order by "TIMESTAMP" range between 10800 preceding and current row)
      as CUM_QUANTITY
    FROM transaction
    
    ID NAME TIMESTAMP QUANTITY CUM_QUANTITY
    1 dani 1686311907 1 1
    2 dani 1686312071 4 5
    3 dani 1686748928 2 2
    4 pet 1687937005 2 2
    5 pet 1688109281 6 6

    fiddle

    Update:
    With the WORKING_SLOT artificial column.
    Its computation inside of each name is based on:

    • calculation of "flag" value for each row - 0 when a difference between current and last timestamp is less then 3 hours and 1 otherwise
    • cumulative sum of these flags

    The result is the same on this dataset, but may be different on others.

    WITH 
    transaction ("ID", "NAME", "TIMESTAMP", "QUANTITY") AS 
    (
    VALUES
      (1, 'dani', 1686311907, 1),
      (2, 'dani', 1686312071, 4),
      (3, 'dani', 1686748928, 2),
      (4, 'pet',  1687937005, 2),
      (5, 'pet',  1688109281, 6)
    )
    , transaction_with_working_slot AS 
    (
    SELECT 
      * 
    , SUM 
    (
      CASE 
        WHEN "TIMESTAMP" - COALESCE (LAG ("TIMESTAMP") OVER (PARTITION BY "NAME" ORDER BY "TIMESTAMP"), 0) < 10800
        THEN 0
        ELSE 1
      END
    ) OVER (PARTITION BY "NAME" ORDER BY "TIMESTAMP")
      AS WORKING_SLOT
    FROM transaction
    )
    SELECT 
      *
    , SUM ("QUANTITY") OVER (PARTITION BY "NAME", "WORKING_SLOT" ORDER BY "TIMESTAMP")
      AS CUM_QUANTITY
    FROM transaction_with_working_slot
    ORDER BY "NAME", "TIMESTAMP"
    
    ID NAME TIMESTAMP QUANTITY WORKING_SLOT CUM_QUANTITY
    1 dani 1686311907 1 1 1
    2 dani 1686312071 4 1 5
    3 dani 1686748928 2 2 2
    4 pet 1687937005 2 1 2
    5 pet 1688109281 6 2 6

    fiddle