Search code examples
sqlpostgresqlgreatest-n-per-group

Use GROUP_BY with extra columns in postgreSQL


Let's suppose we have a simple table users:

id  | name | company |  other columns.......
----+------+---------+-------------------------
 1  | A    | A1
 2  | A    | A2
 3  | B    | B1
 4  | C    | C1
 5  | C    | C2
 6  | C    | C3
 ....

I want to group by name, and chose the latest value for id and company. The result I expect is three column table:

id  | name | company |
----+------+---------+
 2  | A    | A2
 3  | B    | B1
 6  | C    | C3
 ....

I am trying to use GROUP_BY, but don't know how to include company column:

SELECT
  max(id),
  name,
  ? # I don't know how to include company
FROM users
GROUP_BY name

Does anyone have better idea?


Solution

  • Use distinct on:

    select distinct on (name) u.*
    from users u
    order by name, id desc;
    

    distinct on is a very convenient Postgres extension. It returns the first row from a group of rows. The "grouping" is based on the columns after distinct on. The ordering is based on the order by clause.

    There are two other common ways to address this. One method uses window functions:

    select u.*
    from (select u.*,
                 row_number() over (partition by name order by id desc) as seqnum
          from users u
         ) u
    where seqnum = 1;
    

    Or a correlated subquery:

    select u.*
    from users u
    where u.id = (select max(u2.id) from users u2 where u2.name -= u.name);
    

    There is even a "clever" way of doing this using group by. Postgres doesn't have a "first" or "last" aggregation function. But you can use arrays:

    select name, max(id),
           (array_agg(company order by id desc))[1] as country
    from users u
    group by name;