Search code examples
sqlpostgresqlgroup-bygreatest-n-per-group

Postgres group by columns and within group select other columns by max aggregate


This is probably a standard problem, and I've keyed off some other answers, but so far been unable to resolve my current problem.

A              B             C
+----+-------+ +----+------+ +----+------+-------+
| id | start | | id | a_id | | id | b_id | name  |
+----+-------+ +----+------+ +----+------+-------+
|  1 |     1 | |  1 |    1 | |  1 |    1 | aname |
|  2 |     2 | |  2 |    1 | |  2 |    2 | aname |
+----+-------+ |  3 |    2 | |  3 |    3 | aname |
               +----+------+ |  4 |    3 | bname |
                             +----+------+-------+

In English what I'd like to accomplish is:

  1. For each c.name, select its newest entry based on the start time in a.start

The SQL I've tried is the following:

SELECT a.id, a.start, c.id, c.name 
FROM a
INNER JOIN (
    SELECT id, MAX(start) as start
    FROM a
    GROUP BY id
) a2 ON a.id = a2.id AND a.start = a2.start
JOIN b
ON a.id = b.a_id
JOIN c
on b.id = c.b_id
GROUP BY c.name;

It fails with errors such as:

ERROR: column "a.id" must appear in the GROUP BY clause or be used in an aggregate function Position: 8

To be useful I really need the ids from the query, but cannot group on them since they are unique. Here is an example of output I'd love for the first case above:

+------+---------+------+--------+
| a.id | a.start | c.id | c.name |
+------+---------+------+--------+
|    2 |       2 |    3 | aname  |
|    2 |       2 |    4 | bname  |
+------+---------+------+--------+

Here is a Sqlfiddle

Edit - removed second case


Solution

  • Case 1

    select distinct on (c.name)
        a.id, a.start, c.id, c.name
    from
        a
        inner join
        b on a.id = b.a_id
        inner join
        c on b.id = c.b_id
    order by c.name, a.start desc
    ;
     id | start | id | name  
    ----+-------+----+-------
      2 |     2 |  3 | aname
      2 |     2 |  4 | bname
    

    Case 2

    select distinct on (c.name)
        a.id, a.start, c.id, c.name
    from
        a
        inner join
        b on a.id = b.a_id
        inner join
        c on b.id = c.b_id
    where
        b.a_id in (
            select a_id
            from b
            group by a_id
            having count(*) > 1
        )
    order by c.name, a.start desc
    ;
     id | start | id | name  
    ----+-------+----+-------
      1 |     1 |  1 | aname