I need to count average price and group them by 2 columns. Then select top 2 values (PostgreSQL 10.1). E.g, I have the following structures:
------------------------------------------------------------------------------------------
category | shop_name | price | date |
MSI GeForce RTX 2080 |amazon | 62649 | 1/6/2019 |
MSI GeForce RTX 2080 |amazon | 58668 | 1/17/2019 |
MSI GeForce RTX 2080 |amazon | 62649 | 1/7/2019 |
MSI GeForce RTX 2080 |amazon | 60542 | 1/16/2019 |
MSI GeForce RTX 2080 |amazon | 62649 | 1/5/2019 |
MSI GeForce RTX 2080 |brandstar | 66456 | 1/16/2019 |
MSI GeForce RTX 2080 |brandstar | 66347 | 1/17/2019 |
MSI GeForce RTX 2080 |brandstar | 66456 | 1/16/2019 |
MSI GeForce RTX 2080 |brigo | 63300 | 1/17/2019 |
MSI GeForce RTX 2080 |brigo | 65330 | 1/16/2019 |
MSI GeForce RTX 2080 |brigo | 65330 | 1/16/2019 |
MSI GeForce RTX 2070 | fake_shop | 65330 | 1/16/2019 |
MSI GeForce RTX 2070 | fake_shop | 65330 | 1/17/2019 |
MSI GeForce RTX 2070 | fake_shop | 65330 | 1/18/2019 |
Suppose I want to select top 2 average results for category and shop_name. So I expect the following result:
category | shop_name | price | date | avg |
MSI GeForce RTX 2080 |amazon | 62649 | 1/6/2019 | 61431.4 |1
MSI GeForce RTX 2080 |amazon | 58668 | 1/17/2019 | 61431.4 |1
MSI GeForce RTX 2080 |amazon | 62649 | 1/7/2019 | 61431.4 |1
MSI GeForce RTX 2080 |amazon | 60542 | 1/16/2019 | 61431.4 |1
MSI GeForce RTX 2080 |amazon | 62649 | 1/5/2019 | 61431.4 |1
MSI GeForce RTX 2080 |brandstar | 66456 | 1/16/2019 | 66419.66667 | 3
MSI GeForce RTX 2080 |brandstar | 66347 | 1/17/2019 | 66419.66667 | 3
MSI GeForce RTX 2080 |brandstar | 66456 | 1/16/2019 | 66419.66667 | 3
MSI GeForce RTX 2080 |brigo | 63300 | 1/17/2019 | 64653.33333 | 2
MSI GeForce RTX 2080 |brigo | 65330 | 1/16/2019 | 64653.33333 | 2
MSI GeForce RTX 2080 |brigo | 65330 | 1/16/2019 | 64653.33333 | 2
MSI GeForce RTX 2070 | fake_shop | 65330 | 1/16/2019 | 65330 | 1
MSI GeForce RTX 2070 | fake_shop | 65330 | 1/17/2019 | 65330 | 1
MSI GeForce RTX 2070 | fake_shop | 65330 | 1/18/2019 | 65330 | 1
Then i'd like to choose rows where rank is less than 3.
But I get the following result:
---------------------------------------------------------------------------------------------
MSI GeForce RTX 2080 |amazon | 62649 | 1/6/2019 | 61431.4 | 1 |
MSI GeForce RTX 2080 |amazon | 58668 | 1/17/2019 | 61431.4 | 1 |
MSI GeForce RTX 2080 |amazon | 62649 | 1/7/2019 | 61431.4 | 1 |
MSI GeForce RTX 2080 |amazon | 60542 | 1/16/2019 | 61431.4 | 1 |
MSI GeForce RTX 2080 |amazon | 62649 | 1/5/2019 | 61431.4 | 1 |
MSI GeForce RTX 2080 |brandstar | 66456 | 1/16/2019 | 66419.66667 | 1 |
MSI GeForce RTX 2080 |brandstar | 66347 | 1/17/2019 | 66419.66667 | 1 |
MSI GeForce RTX 2080 |brandstar | 66456 | 1/16/2019 | 66419.66667 | 1 |
MSI GeForce RTX 2080 |brigo | 63300 | 1/17/2019 | 64653.33333 | 1 |
MSI GeForce RTX 2080 |brigo | 65330 | 1/16/2019 | 64653.33333 | 1 |
MSI GeForce RTX 2080 |brigo | 65330 | 1/16/2019 | 64653.33333 | 1 |
MSI GeForce RTX 2070 | fake_shop | 65330 | 1/16/2019 | 65330 | 1
MSI GeForce RTX 2070 | fake_shop | 65330 | 1/17/2019 | 65330 | 1
MSI GeForce RTX 2070 | fake_shop | 65330 | 1/18/2019 | 65330 | 1
Here is my SQL query:
SELECT tt.category,
tt.shop_name,
tt.price,
tt.updated,
tt.avg_price,
rank() OVER (PARTITION BY tt.category,
tt.shop_name,
tt.avg_price
ORDER BY tt.avg_price DESC)
FROM
( SELECT category,
LOWER(shop_name) AS shop_name,
CAST (price AS INTEGER) AS price,
DATE(updated) AS updated,
avg(price) OVER (PARTITION BY category,
LOWER(shop_name)) AS avg_price
FROM prices ) AS tt
Just use AVG() OVER ()
followed by DENSE_RANK()
:
WITH cte1 AS (
SELECT *, AVG(price) OVER (PARTITION BY category, shop_name) AS avg_price
FROM prices
), cte2 AS (
SELECT *, DENSE_RANK() OVER (PARTITION BY category ORDER BY avg_price) AS rnk
FROM cte1
)
SELECT *
FROM cte2
WHERE rnk <= 2
ORDER BY category, shop_name