Search code examples
postgresqlgreatest-n-per-group

How can I retrieve the complete record with the maximun value grouped by another in Postgres?


I have a table (it's a big query in fact, so don't use joins over the table please) as follows:

date     | priority | data
20200301 | 1        | 0.3
20200301 | 2        | 0.4
20200302 | 2        | 0.4
20200302 | 3        | 0.1
20200303 | 1        | 0.8

So, I want the date and the data with the LOWEST priority of each date, so the result of the query I'm looking for would be:

date     | priority | data
20200301 | 1        | 0.3
20200302 | 2        | 0.4
20200303 | 1        | 0.8

Whenever I try to make a group by clause, that query cannot retrieve the data column nor support different values on the data column.


Solution

  • You can use a the row_number window function for this:

    CREATE TABLE t (
        "date"     INTEGER,
        "priority" INTEGER,
        "data"     FLOAT
    );
    
    INSERT INTO t
        ("date", "priority", "data")
    VALUES ('20200301', '1', '0.3')
         , ('20200301', '2', '0.4')
         , ('20200302', '2', '0.4')
         , ('20200302', '3', '0.1')
         , ('20200303', '1', '0.8');
    
    
    SELECT *
    FROM (
        SELECT *, row_number() OVER (PARTITION BY date ORDER BY priority)
        FROM t
    ) f
    WHERE row_number = 1
    

    returns:

    +--------+--------+----+----------+
    |date    |priority|data|row_number|
    +--------+--------+----+----------+
    |20200301|1       |0.3 |1         |
    |20200302|2       |0.4 |1         |
    |20200303|1       |0.8 |1         |
    +--------+--------+----+----------+ 
    

    As mentioned by @david in the comments, it might be more efficient to filter the rows based on "priority = min_priority_for_date" (instead of ranking them and filtering them afterwards):

    SELECT *
    FROM t
    WHERE (date, priority) IN (
        SELECT date, MIN(priority)
        FROM t
        GROUP BY date
    )