Search code examples
sqlitegroup-byaggregate-functionswindow-functionsdate-arithmetic

How can I construct a windowed aggregate query over an acceptable time window delta increment


I am recording user events into some SQLite tables.

Currently the tables are counting interactions over buckets of 1 minute which is updated on insertion.

The schema of the tables is like this:

CREATE TABLE table1(
                window DATETIME,
                counter INTEGER NOT NULL DEFAULT 1,
                value INTEGER NOT NULL DEFAULT 0,
                file_id INTEGER NOT NULL,
                PRIMARY KEY(window,file_id)
            );

CREATE TABLE table2(
                window DATETIME,
                counter INTEGER NOT NULL DEFAULT 1,
                value INTEGER NOT NULL DEFAULT 0,
                file_id INTEGER NOT NULL,
                PRIMARY KEY(window,file_id)
            );

CREATE TABLE files(name TEXT PRIMARY KEY NOT NULL ON CONFLICT REPLACE DEFAULT 'NA', touched DATE DEFAULT (date('now')));

Insertion to the tables is performed like this:

INSERT OR REPLACE INTO files(ROWID, name, touched)
    VALUES(
        (select ROWID FROM files WHERE name=IFNULL(?1,'NA') UNION SELECT max(ROWID) + 1 FROM files limit 1),
        IFNULL(?1,'NA'), 
        date()
    )

INSERT INTO table1(window,file_id,value)
        VALUES(
            datetime(strftime('%s', 'now') - (strftime('%s', 'now') % 60), 'unixepoch', 'localtime'),
            (SELECT rowid FROM files WHERE name=IFNULL(:fname,'NA')),
            :delta
        ) ON CONFLICT(window,file_id) DO
            UPDATE SET count = count + 1, value = value + :delta

Some example data would look like this:

select * from table1

window               file_id                 counter      value
-------------------  -------------------  -----------  ----------
2022-06-13 10:26:00  1                        29           3         
2022-06-13 10:27:00  2                        99           7
2022-06-13 10:30:00  3                        1            22
2022-06-13 10:31:00  4                        1            22
2022-06-13 10:34:00  1                        1            22
2022-06-13 10:39:00  1                        1            22
2022-06-13 10:40:00  1                        1            22
2022-06-13 10:53:00  1                        1            22
              

What I want to do is aggregate the "counter" and "value" columns based on active sessions. The session is considered active so long as the previous activity (row window) was within 10 minutes of the current activity (row windows separated by less than 10 minutes).

In this example, rows 1-7 belong to the same session since each activity (row window) is less than 10 minutes away from the previous activity. The last row defines a new session as it is 13 minutes after the previous activity ( 13 minutes = (10:53 - 10:40) is greater than 10 minutes).

This query should simultaneously aggregate from table2 (they are recorded at the same time but reflect different data)

My desired output should include data like this (per the rows 1-7 and row 8)

session 1:
started at: 2022-06-13 10:26:00
ended at:  2022-06-13 10:40:00
counter_sum: 22 + 99 + 1 + 1 + 1 + ... +1 (the sum of count column)
value_sum: 3 + 7 + 22 + 22 .... + 22 (the sum of value column)

session 2:
start at: 2022-06-13 10:53:00
finished at: 2022-06-13 10:53:00 (since no data after this assume same time)
count_sum: 1
value_sum: 22

How would you construct this type of windowing query?


Solution

  • You need a column that marks the active groups, like:

    SELECT *, SUM(flag) OVER (ORDER BY window) grp
    FROM (
      SELECT *, strftime('%s', window) - COALESCE(strftime('%s', LAG(window) OVER (ORDER BY window)), 0) > 600 flag
      FROM table1
    );
    

    and then you can aggregate based on that column like:

    WITH cte AS (
      SELECT *, SUM(flag) OVER (ORDER BY window) grp
      FROM (
        SELECT *, strftime('%s', window) - COALESCE(strftime('%s', LAG(window) OVER (ORDER BY window)), 0) > 600 flag
        FROM table1
      )
    )
    SELECT grp,
           MIN(window) min_window,
           MAX(window) max_window,
           SUM(counter) sum_counter,
           SUM(value) sum_value
    FROM cte
    GROUP BY grp; 
    
       
    

    See the demo.