sql postgresql greatest-n-per-group window-functions

6 top min average elements postgresql

I need to count average price and group them by 2 columns. Then select top 2 values (PostgreSQL 10.1). E.g, I have the following structures:

------------------------------------------------------------------------------------------
        category        | shop_name |     price |      date     |
MSI GeForce RTX 2080    |amazon     |   62649   |   1/6/2019    |   
MSI GeForce RTX 2080    |amazon     |   58668   |   1/17/2019   |   
MSI GeForce RTX 2080    |amazon     |   62649   |   1/7/2019    |   
MSI GeForce RTX 2080    |amazon     |   60542   |   1/16/2019   |   
MSI GeForce RTX 2080    |amazon     |   62649   |   1/5/2019    |   
MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   
MSI GeForce RTX 2080    |brandstar  |   66347   |   1/17/2019   |   
MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   
MSI GeForce RTX 2080    |brigo      |   63300   |   1/17/2019   |   
MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   
MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/16/2019   |
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/17/2019   |
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/18/2019   |

Suppose I want to select top 2 average results for category and shop_name. So I expect the following result:

        category        | shop_name |     price |      date     |     avg   |
MSI GeForce RTX 2080    |amazon     |   62649   |   1/6/2019    |   61431.4 |1
MSI GeForce RTX 2080    |amazon     |   58668   |   1/17/2019   |   61431.4 |1  
MSI GeForce RTX 2080    |amazon     |   62649   |   1/7/2019    |   61431.4 |1  
MSI GeForce RTX 2080    |amazon     |   60542   |   1/16/2019   |   61431.4 |1  
MSI GeForce RTX 2080    |amazon     |   62649   |   1/5/2019    |   61431.4 |1  
MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   66419.66667 |  3
MSI GeForce RTX 2080    |brandstar  |   66347   |   1/17/2019   |   66419.66667 |  3
MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   66419.66667 |  3
MSI GeForce RTX 2080    |brigo      |   63300   |   1/17/2019   |   64653.33333 |  2
MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   64653.33333 |  2
MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   64653.33333 |  2
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/16/2019   |   65330   | 1
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/17/2019   |   65330   | 1
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/18/2019   |   65330   | 1

Then i'd like to choose rows where rank is less than 3.

But I get the following result:

    ---------------------------------------------------------------------------------------------
    MSI GeForce RTX 2080    |amazon     |   62649   |   1/6/2019    |   61431.4 |   1   |
    MSI GeForce RTX 2080    |amazon     |   58668   |   1/17/2019   |   61431.4 |   1   |
    MSI GeForce RTX 2080    |amazon     |   62649   |   1/7/2019    |   61431.4 |   1   |
    MSI GeForce RTX 2080    |amazon     |   60542   |   1/16/2019   |   61431.4 |   1   |
    MSI GeForce RTX 2080    |amazon     |   62649   |   1/5/2019    |   61431.4 |   1   |
    MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   66419.66667 |   1   |
    MSI GeForce RTX 2080    |brandstar  |   66347   |   1/17/2019   |   66419.66667 |   1   |
    MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   66419.66667 |   1   |
    MSI GeForce RTX 2080    |brigo      |   63300   |   1/17/2019   |   64653.33333 |   1   |
    MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   64653.33333 |   1   |
    MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   64653.33333 |   1   |
    MSI GeForce RTX 2070    | fake_shop |   65330   |   1/16/2019   |   65330   | 1
    MSI GeForce RTX 2070    | fake_shop |   65330   |   1/17/2019   |   65330   | 1
    MSI GeForce RTX 2070    | fake_shop |   65330   |   1/18/2019   |   65330   | 1

Here is my SQL query:

SELECT tt.category,
       tt.shop_name,
       tt.price,
       tt.updated,
       tt.avg_price,
       rank() OVER (PARTITION BY tt.category,
                                 tt.shop_name,
                                 tt.avg_price
                    ORDER BY tt.avg_price DESC)
FROM
  ( SELECT category,
           LOWER(shop_name) AS shop_name,
           CAST (price AS INTEGER) AS price,
                DATE(updated) AS updated,
                avg(price) OVER (PARTITION BY category,
                                              LOWER(shop_name)) AS avg_price
   FROM prices ) AS tt

Solution

Just use AVG() OVER () followed by DENSE_RANK():

WITH cte1 AS (
    SELECT *, AVG(price) OVER (PARTITION BY category, shop_name) AS avg_price
    FROM prices
), cte2 AS (
    SELECT *, DENSE_RANK() OVER (PARTITION BY category ORDER BY avg_price) AS rnk
    FROM cte1
)
SELECT *
FROM cte2
WHERE rnk <= 2
ORDER BY category, shop_name