Search code examples
sqlpostgresqlgroup-by

Postgres Overlapping Groups/Reusing rows between groups


I have a table ordered by some column A. I would like to perform an aggregation on n rows at a time but I'd also like to incorporate the previous k and the next k' rows as context.

For example, let's say I have the following data ordered by A.

A B
0 2
1 3
2 7
3 4
4 3
5 2

aggregating n rows is as simple as group by A / n. For the case n=2 that would give us the groups (on A) [0, 1], [2, 3], [4, 5]. An example query may be to get the sum:

with my_table(A, B) as (
    values
    (0, 2),
    (1, 3),
    (2, 7),
    (3, 4),
    (4, 3),
    (5, 2)
)
select A / 2 as A_2, sum(B)
from my_table
group by A / 2

outputs:

A_2 sum(B)
0 5
1 11
2 5

What I find difficult is to get overlapping groups. Let's say k = k' = 2. Then I'd like to get the groups (on A): [0, 1, 2, 3], [0, 1, 2, 3, 4, 5], [2, 3, 4, 5] Is there a way to do so in postgres? either through group by or window functions? I'd like to do something like:

with my_table(A, B) as (
    values
    (0, 2),
    (1, 3),
    (2, 7),
    (3, 4),
    (4, 3),
    (5, 2)
)
select ??? as ???, sum(B)
from my_table
group by ???

giving an output.

A_??? B
0 16
1 21
2 16

sum is used here as just an example. I'd like to do any arbitrary aggregate. It also does not have to be in a single CTE.

I think you would call such a thing a sliding window with overlap


Solution

  • You can use a window function for this, then filter afterwards using modulo division.

    with my_table(A, B) as (
        values
        (0, 2),
        (1, 3),
        (2, 7),
        (3, 4),
        (4, 3),
        (5, 2)
    ),
    running as (
        select *,
          sum(B) over (order by A rows between 2 preceding and 3 following) as running_sum
        from my_table
    )
    select *
    from running
    where A % 2 = 0;
    

    db<>fiddle