Let's be the following table in a DB2 DBMS:
WITH DUMMY_TABLE ("ID", "NAME", "UNIX_TIMESTAMP", "SLOT", "AMOUNT") AS (
VALUES
(1, 'dani', 1686311907, 0, 22.0),
(2, 'dani', 1686311988, 0, 25.0),
(3, 'dani', 1686320086, 1, 13.0),
(4, 'dani', 1686320095, 1, 14.0),
(5, 'kane', 1686311032, 0, 8.0),
(6, 'kane', 1686311099, 0, 27.0),
(7, 'kane', 1686322000, 1, 12.0),
(8, 'kane', 1686322045, 1, 17.0)
)
SELECT * FROM DUMMY_TABLE
ID | NAME | UNIX_TIMESTAMP | SLOT | AMOUNT |
---|---|---|---|---|
1 | dani | 1686311907 | 0 | 22.0 |
2 | dani | 1686311988 | 0 | 25.0 |
3 | dani | 1686320086 | 1 | 13.0 |
4 | dani | 1686320095 | 1 | 14.0 |
5 | kane | 1686311032 | 0 | 8.0 |
6 | kane | 1686311099 | 0 | 27.0 |
7 | kane | 1686322000 | 1 | 12.0 |
8 | kane | 1686322045 | 1 | 17.0 |
What I want to do is to group the data by "NAME" and by "SLOT", and then, for each of these groupings, I want to calculate the average variation of the "AMOUNT" column. This average variation, by definition, is calculated as the quotient between the difference of the "AMOUNT" values between the two extreme times, and the difference between these two times (mean value theorem), per group.
To show an example of what I would like to achieve, let me use a pseudo-SQL query (syntactically it is not correct, but it can be used to illustrate what I intend to achieve):
SELECT
(AMOUNT[MAX(UNIX_TIMESTAMP)] - AMOUNT[MIN(UNIX_TIMESTAMP)]) / (MAX(UNIX_TIMESTAMP) - MIN(UNIX_TIMESTAMP))
FROM DUMMY_TABLE
GROUP BY NAME, SLOT
That is, for each grouping, you should obtain both its extreme values of time (MIN and MAX functions), as well as the corresponding values of the "AMOUNT" column for these extreme values of time, and with all this, perform the corresponding quotient. So far I have not been able to find a simple SQL function to solve this kind of common problem.
Something like this:
WITH DUMMY_TABLE ("ID", "NAME", "UNIX_TIMESTAMP", "SLOT", "AMOUNT") AS (
VALUES
(1, 'dani', 1686311907, 0, 22.0),
(2, 'dani', 1686311988, 0, 25.0),
(3, 'dani', 1686320086, 1, 13.0),
(4, 'dani', 1686320095, 1, 14.0),
(5, 'kane', 1686311032, 0, 8.0),
(6, 'kane', 1686311099, 0, 27.0),
(7, 'kane', 1686322000, 1, 12.0),
(8, 'kane', 1686322045, 1, 17.0)
)
SELECT name, slot, avg_var
FROM
(
SELECT
*
,
(
last_value (amount) over (partition by name, slot order by unix_timestamp rows between unbounded preceding and unbounded following)
- first_value (amount) over (partition by name, slot order by unix_timestamp rows between unbounded preceding and unbounded following)
)
/
nullif
(
last_value (unix_timestamp) over (partition by name, slot order by unix_timestamp rows between unbounded preceding and unbounded following)
- first_value (unix_timestamp) over (partition by name, slot order by unix_timestamp rows between unbounded preceding and unbounded following)
, 0
)
as avg_var
-- avg_var is the same for each group;
-- just to get one row from each group
, row_number () over (partition by name, slot) as rn_
FROM DUMMY_TABLE
)
WHERE rn_ = 1
NAME | SLOT | AVG_VAR |
---|---|---|
dani | 0 | 0.0370370370370370370370370370 |
dani | 1 | 0.1111111111111111111111111111 |
kane | 0 | 0.2835820895522388059701492537 |
kane | 1 | 0.1111111111111111111111111111 |