Search code examples
sqlpostgresqldatetimegroup-bygreatest-n-per-group

How can I find the item id related to the maximum resource by year in SQL (postgres)?


I'm trying to run a SQL query that seems easy but I don't understand what mechanism I should use to achieve this. I have the following table:

| id | playcount | release_date |
| -- | --------- | ------------ |
| 1  | 47712     | 2019-12-27   |
| 2  | 626778    | 2017-07-23   |
| 3  | 827091    | 2019-09-12   |
| 4  | 66419     | 2015-09-05   |
| 5  | 58627     | 2016-09-12   |
| 6  | 60272     | 2017-09-06   |
| 7  | 1762582   | 2017-10-07   |
| 8  | 11280     | 2016-10-12   |
| 9  | 30890     | 2019-10-29   |
| 10 | 715173    | 2019-07-02   |

I would like to find which id holds the maximum playcount per year. In this example, the id that has the maximum playcount in 2019 is 3.

And I'm looking for the following output:

playcount_table
| year | playcount | id |
| ---- | --------- | -- |
| 2019 | 827091    | 3  |
| 2017 | 1762582   | 7  |
| 2016 | 58627     | 5  |
| 2015 | 66419     | 4  |

I succeeded to get the maximum playcount by year with the following query:

select to_char(date_trunc('year', TO_DATE(p_table.release_date, 'YYYY-MM-DD')), 'YYYY-MM-DD') as year,
       max(p_table.playcount) as playcounts
from playcount_table as p_table
group by year;

But I can't retrieve the id related to this max playcount.

Can you help me with this ?

Thank you in advance,


Solution

  • Using ROW_NUMBER:

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY EXTRACT(year FROM release_date)
                                     ORDER BY playcount DESC) rn
        FROM playcount_table
    )
    
    SELECT
        EXTRACT(year FROM release_date) year
        playcount,
        id
    FROM cte
    WHERE
        rn = 1
    ORDER BY
        EXTRACT(year FROM release_date) DESC;
    

    If a given year could have two or more records tied for the greatest play count, and you would want to report both of them, then replace ROW_NUMBER above with RANK.

    We might also be able to use DISTINCT ON here:

    SELECT DISTINCT ON (EXTRACT(year FROM release_date))
        EXTRACT(year FROM release_date),
        playcount,
        id
    FROM playcount_table
    ORDER BY
        EXTRACT(year FROM release_date),
        playcount DESC;