I have a query something like this:
SELECT product_id,
site,
category_id,
session_time,
sum(cast(coalesce("#clicks",
0) AS bigint)) AS clicks
FROM df
WHERE site IN ('com', 'co')
AND session_time = DATE('2020-02-27')
GROUP BY product_id, site, session_time, category_id
ORDER BY clicks desc
LIMIT 10
But now, I want to see the top 10 product_id for each site and category_id based on the clicks. When I write the LIMIT function, it only shows the top 10 products but it does not group it by category_id and shop_id.
How can I do this?
Use window functions. You can RANK()
records by descending clicks
within site
/category
partitions in a subquery, and then filter in the outer query:
SELECT *
FROM (
SELECT
product_id,
site,
category_id,
session_time,
SUM("#clicks") clicks,
RANK() OVER(PARTITION BY site, category_id ORDER BY sum("#clicks") DESC) rn
FROM df
WHERE
site IN ('com', 'co')
AND session_time = DATE('2020-02-27')
GROUP BY product_id, site, session_time, category_id
) t
WHERE rn <= 10
ORDER BY site, category, clicks desc
I am unclear on why you need the coalesce()
/cast()
logic in the sum()
(just like other aggregate functions, sum()
ignore null
values, and it seems like #clicks
is a number already), so I removed it - you can add it back if you do need it, for some reason that I could not think of.