postgresql aggregate greatest-n-per-group window-functions gaps-and-islands

How to include the first row from the next group in an aggregation?

I have a table in Postgres with categories and values. I want to perform an aggregation, e.g. avg(value) per category but including the first row of the next category in the aggregation.

Sample table:

id  category  value
-------------------
1   1         5.4
2   1         2.1
3   2         1.0
4   2         2.6
5   2         0.3
6   3         4.4
7   3         3.8

id is a primary key and provides an order. Categories are grouped within the order and consecutive.
It would be acceptable (but not necessary) to create an intermediate table like this, which is duplicating the adjacent row:

id  category  value
-------------------
1   1         5.4
2   1         2.1
3   1         1.0  <-- new row
4   2         1.0
5   2         2.6
6   2         0.3
7   2         4.4  <-- new row
8   3         4.4
9   3         3.8

... and then do:

select category, avg(value) group by category from sample_table

How can this be achieved with SQL statements?

I suspect this could be done with window functions and some sophisticated frame clause like GROUPS, but I don't know how. ( See https://www.postgresql.org/docs/12/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS )

Solution

You confirmed that category numbers are increasing by 1 steadily, no gaps.
Here is a simple approach for the simple case:

SELECT category, avg(value)
FROM  (
   SELECT category, value
   FROM   tbl

   UNION ALL
   (  -- parentheses required
   SELECT DISTINCT ON (category)
          category - 1, value
   FROM   tbl
   WHERE  category > (SELECT min(category) FROM tbl)  -- eliminate corner case
   ORDER  BY category, id
   )   
   ) sub
GROUP  BY 1
ORDER  BY 1;

The second term after UNION ALL adds the row like you suggested: I take the first row of each group and simply subtract 1 from the category.

Corner case: adds a new category with min(category) - 1. Can be eliminated easily ...

Generic solution for any kind of categories (as long as the order is defined):

SELECT category, avg(value)
FROM  (
   SELECT category, value
   FROM   tbl

   UNION ALL
   SELECT lag(category) OVER (ORDER BY category), value
   FROM  (
      SELECT DISTINCT ON (category)
             category, value
      FROM   tbl
      ORDER  BY category, id
      ) unicat
   ) sub
WHERE  category IS NOT NULL  -- eliminate corner case
GROUP  BY 1
ORDER  BY 1;

The first value of each group is added to to the previous category using the window function lag().

About DISTINCT ON:

Select first row in each GROUP BY group?