I have a table in Postgres with categories and values. I want to perform an aggregation, e.g. avg(value) per category but including the first row of the next category in the aggregation.
Sample table:
id category value
-------------------
1 1 5.4
2 1 2.1
3 2 1.0
4 2 2.6
5 2 0.3
6 3 4.4
7 3 3.8
id
is a primary key and provides an order. Categories are grouped within the order and consecutive.
It would be acceptable (but not necessary) to create an intermediate table like this, which is duplicating the adjacent row:
id category value
-------------------
1 1 5.4
2 1 2.1
3 1 1.0 <-- new row
4 2 1.0
5 2 2.6
6 2 0.3
7 2 4.4 <-- new row
8 3 4.4
9 3 3.8
... and then do:
select category, avg(value) group by category from sample_table
How can this be achieved with SQL statements?
I suspect this could be done with window functions and some sophisticated frame clause like GROUPS, but I don't know how. ( See https://www.postgresql.org/docs/12/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS )
You confirmed that category numbers are increasing by 1 steadily, no gaps.
Here is a simple approach for the simple case:
SELECT category, avg(value)
FROM (
SELECT category, value
FROM tbl
UNION ALL
( -- parentheses required
SELECT DISTINCT ON (category)
category - 1, value
FROM tbl
WHERE category > (SELECT min(category) FROM tbl) -- eliminate corner case
ORDER BY category, id
)
) sub
GROUP BY 1
ORDER BY 1;
The second term after UNION ALL
adds the row like you suggested: I take the first row of each group and simply subtract 1 from the category.
Corner case: adds a new category with min(category) - 1
. Can be eliminated easily ...
Generic solution for any kind of categories (as long as the order is defined):
SELECT category, avg(value)
FROM (
SELECT category, value
FROM tbl
UNION ALL
SELECT lag(category) OVER (ORDER BY category), value
FROM (
SELECT DISTINCT ON (category)
category, value
FROM tbl
ORDER BY category, id
) unicat
) sub
WHERE category IS NOT NULL -- eliminate corner case
GROUP BY 1
ORDER BY 1;
The first value of each group is added to to the previous category using the window function lag()
.
About DISTINCT ON
: