Search code examples
sqldb2greatest-n-per-group

How to Calculate the Average Variation of a Value Based on Grouping by Multiple Columns, Using the Difference of Extreme Time Values in SQL?


Let's be the following table in a DB2 DBMS:

WITH DUMMY_TABLE ("ID", "NAME", "UNIX_TIMESTAMP", "SLOT", "AMOUNT") AS (
VALUES
    (1, 'dani', 1686311907, 0, 22.0),
    (2, 'dani', 1686311988, 0, 25.0),
    (3, 'dani', 1686320086, 1, 13.0),
    (4, 'dani', 1686320095, 1, 14.0),
    (5, 'kane', 1686311032, 0, 8.0),
    (6, 'kane', 1686311099, 0, 27.0),
    (7, 'kane', 1686322000, 1, 12.0),
    (8, 'kane', 1686322045, 1, 17.0)
)
SELECT * FROM DUMMY_TABLE
ID NAME UNIX_TIMESTAMP SLOT AMOUNT
1 dani 1686311907 0 22.0
2 dani 1686311988 0 25.0
3 dani 1686320086 1 13.0
4 dani 1686320095 1 14.0
5 kane 1686311032 0 8.0
6 kane 1686311099 0 27.0
7 kane 1686322000 1 12.0
8 kane 1686322045 1 17.0

What I want to do is to group the data by "NAME" and by "SLOT", and then, for each of these groupings, I want to calculate the average variation of the "AMOUNT" column. This average variation, by definition, is calculated as the quotient between the difference of the "AMOUNT" values between the two extreme times, and the difference between these two times (mean value theorem), per group.

To show an example of what I would like to achieve, let me use a pseudo-SQL query (syntactically it is not correct, but it can be used to illustrate what I intend to achieve):

SELECT
(AMOUNT[MAX(UNIX_TIMESTAMP)] - AMOUNT[MIN(UNIX_TIMESTAMP)]) / (MAX(UNIX_TIMESTAMP) - MIN(UNIX_TIMESTAMP))
FROM DUMMY_TABLE
GROUP BY NAME, SLOT

That is, for each grouping, you should obtain both its extreme values of time (MIN and MAX functions), as well as the corresponding values of the "AMOUNT" column for these extreme values of time, and with all this, perform the corresponding quotient. So far I have not been able to find a simple SQL function to solve this kind of common problem.


Solution

  • Something like this:

    WITH DUMMY_TABLE ("ID", "NAME", "UNIX_TIMESTAMP", "SLOT", "AMOUNT") AS (
    VALUES
        (1, 'dani', 1686311907, 0, 22.0),
        (2, 'dani', 1686311988, 0, 25.0),
        (3, 'dani', 1686320086, 1, 13.0),
        (4, 'dani', 1686320095, 1, 14.0),
        (5, 'kane', 1686311032, 0, 8.0),
        (6, 'kane', 1686311099, 0, 27.0),
        (7, 'kane', 1686322000, 1, 12.0),
        (8, 'kane', 1686322045, 1, 17.0)
    )
    SELECT name, slot, avg_var
    FROM
    (
    SELECT
      * 
    , 
      (
        last_value  (amount) over (partition by name, slot order by unix_timestamp rows between unbounded preceding and unbounded following)
      - first_value (amount) over (partition by name, slot order by unix_timestamp rows between unbounded preceding and unbounded following)
      )
      /
      nullif
      (
        last_value  (unix_timestamp) over (partition by name, slot order by unix_timestamp rows between unbounded preceding and unbounded following)
      - first_value (unix_timestamp) over (partition by name, slot order by unix_timestamp rows between unbounded preceding and unbounded following)
      , 0
      )
      as avg_var
    -- avg_var is the same for each group;
    -- just to get one row from each group  
    , row_number () over (partition by name, slot) as rn_
    FROM DUMMY_TABLE
    )
    WHERE rn_ = 1
    
    NAME SLOT AVG_VAR
    dani 0 0.0370370370370370370370370370
    dani 1 0.1111111111111111111111111111
    kane 0 0.2835820895522388059701492537
    kane 1 0.1111111111111111111111111111

    fiddle