Let's suppose I have the following table.
CREATE TABLE transaction (
"ID" INTEGER PRIMARY KEY,
"NAME" VARCHAR(4),
"TIMESTAMP" INTEGER,
"QUANTITY" INTEGER
);
INSERT INTO transaction
("ID", "NAME", "TIMESTAMP", "QUANTITY")
VALUES
('1', 'dani', '1686311907', '1'),
('2', 'dani', '1686312071', '4'),
('3', 'dani', '1686748928', '2'),
('4', 'pet', '1687937005', '2'),
('5', 'pet', '1688109281', '6');
Which corresponds to:
| ID | NAME | TIMESTAMP | QUANTITY |
| --- | ---- | ---------- | -------- |
| 1 | dani | 1686311907 | 1 |
| 2 | dani | 1686312071 | 4 |
| 3 | dani | 1686748928 | 2 |
| 4 | pet | 1687937005 | 2 |
| 5 | pet | 1688109281 | 6 |
This table describes a series of transactions. In general, what I want is, for each name and for each timestamp, to calculate a cumulative sum of the quantities within each working slot. Each working slot will include those timestamps that are within a time interval of less than 10800 seconds (equivalent to 3 hours). In other words, the cumulative sum must be reset, for each name, on each working slot.
What I would expect to get would be a table like the following:
| ID | NAME | TIMESTAMP | QUANTITY | CUM_QUANTITY |
| --- | ---- | ---------- | -------- | ------------
| 1 | dani | 1686311907 | 1 | 1 |
| 2 | dani | 1686312071 | 4 | 5 |
| 3 | dani | 1686748928 | 2 | 2 | # new working slot
| 4 | pet | 1687937005 | 2 | 2 | # new name
| 5 | pet | 1688109281 | 6 | 6 | # new working slot
So far I've tried to add a transitional column to check if the record belongs to a new day:
select
*,
CASE WHEN coalesce(
"TIMESTAMP" - lag("TIMESTAMP", 1) over (
partition by "NAME"
order by
"TIMESTAMP" asc
),
0
) < 3600 * 3 THEN 0 ELSE 1 END AS NEW_DAY
from
transaction
ORDER BY
"NAME",
"TIMESTAMP"
Which give us the following table:
| ID | NAME | TIMESTAMP | QUANTITY | new_day |
| --- | ---- | ---------- | -------- | ------- |
| 1 | dani | 1686311907 | 1 | 0 |
| 2 | dani | 1686312071 | 4 | 0 |
| 3 | dani | 1686748928 | 2 | 1 |
| 4 | pet | 1687937005 | 2 | 0 |
| 5 | pet | 1688109281 | 6 | 1 |
And then perform some kind of window by using NAME and new_day and aggregating by SUM(QUANTITY), but this approach doesn't work because new_day only takes the values 0 and 1, and it should take from 0, 1, ... to the maximum working slots per user.
Does anybody know how to proceed?
Refer the OLAP specification topic on the RANGE
OLAP function use.
WITH transaction ("ID", "NAME", "TIMESTAMP", "QUANTITY") AS
(
VALUES
(1, 'dani', 1686311907, 1),
(2, 'dani', 1686312071, 4),
(3, 'dani', 1686748928, 2),
(4, 'pet', 1687937005, 2),
(5, 'pet', 1688109281, 6)
)
SELECT
*
, sum ("QUANTITY") over (partition by "NAME" order by "TIMESTAMP" range between 10800 preceding and current row)
as CUM_QUANTITY
FROM transaction
ID | NAME | TIMESTAMP | QUANTITY | CUM_QUANTITY |
---|---|---|---|---|
1 | dani | 1686311907 | 1 | 1 |
2 | dani | 1686312071 | 4 | 5 |
3 | dani | 1686748928 | 2 | 2 |
4 | pet | 1687937005 | 2 | 2 |
5 | pet | 1688109281 | 6 | 6 |
Update:
With the WORKING_SLOT
artificial column.
Its computation inside of each name is based on:
The result is the same on this dataset, but may be different on others.
WITH
transaction ("ID", "NAME", "TIMESTAMP", "QUANTITY") AS
(
VALUES
(1, 'dani', 1686311907, 1),
(2, 'dani', 1686312071, 4),
(3, 'dani', 1686748928, 2),
(4, 'pet', 1687937005, 2),
(5, 'pet', 1688109281, 6)
)
, transaction_with_working_slot AS
(
SELECT
*
, SUM
(
CASE
WHEN "TIMESTAMP" - COALESCE (LAG ("TIMESTAMP") OVER (PARTITION BY "NAME" ORDER BY "TIMESTAMP"), 0) < 10800
THEN 0
ELSE 1
END
) OVER (PARTITION BY "NAME" ORDER BY "TIMESTAMP")
AS WORKING_SLOT
FROM transaction
)
SELECT
*
, SUM ("QUANTITY") OVER (PARTITION BY "NAME", "WORKING_SLOT" ORDER BY "TIMESTAMP")
AS CUM_QUANTITY
FROM transaction_with_working_slot
ORDER BY "NAME", "TIMESTAMP"
ID | NAME | TIMESTAMP | QUANTITY | WORKING_SLOT | CUM_QUANTITY |
---|---|---|---|---|---|
1 | dani | 1686311907 | 1 | 1 | 1 |
2 | dani | 1686312071 | 4 | 1 | 5 |
3 | dani | 1686748928 | 2 | 2 | 2 |
4 | pet | 1687937005 | 2 | 1 | 2 |
5 | pet | 1688109281 | 6 | 2 | 6 |